Download as pdf or txt
Download as pdf or txt
You are on page 1of 307

Aspects of Mathematical Arguments that Influence Eighth Grade Students’ Judgment of

Their Validity

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Yating Liu, M.A.

Graduate Program in Education

The Ohio State University

2013

Dissertation Committee:

Azita Manouchehri, Advisor

Patricia Brosnan

Herb Clemens
Copyright by

Yating Liu

2013
Abstract

The study examined how middle school students evaluate arguments in a wide

range of mathematical contexts. The analysis included investigations on the types of

mathematical arguments that students found convincing, exploratory and appealing,

common aspects and features of arguments that impacted students’ evaluation of the

arguments, and problem contexts’ impact on their judgment.

The study involved two phases, a survey and a follow-up interview. Over five

hundred 8th grade students from five Ohio public schools participated in the survey study,

where they were provided a variety of arguments in four different mathematical contexts

and were asked to determine which of these arguments were convincing, explanatory and

appealing to them. Eight subjects, whose survey responses were distinct from each other,

were selected to participate in the follow-up interviews, where they were asked to explain

their rationale for determining their evaluation of an argument.

Both quantitative and qualitative methods were utilized in data analysis.

Statistical data from the survey was used to identify types of mathematical arguments that

students found convincing, exploratory and appealing. Interview data were coded using a

proof classification framework to identify the aspects and features of arguments that

impacted students’ evaluation of the arguments.

ii
Findings from the survey and interview suggested that the participants’ evaluation

of the same argument was highly diverse among individuals. Their judgment of the same

type of arguments also differed across the problem contexts. The subjects’ explanation in

interviews revealed that the source of evidence had the largest impact on their judgment

of an argument, followed by representation. The reasoning mode, i.e. the link between

evidence and conclusion, was the least concerned aspect. Further investigations indicated

that examples, i.e. results from immediate tests, were the most referenced type of

evidence to support a convincing argument. Students’ preferred representation and

reasoning modes varied. Lastly, it was found that the subjects possessed personal

standards to determine if an argument was convincing. Most subjects didn’t consider the

ability to show the general validity of a conjecture as a requirement for convincing

arguments.

iii
Dedication

Dedicated to my family and friends

iv
Acknowledgement

Studying mathematics education in the U.S. has been a transformative experience.

From the first time I was introduced to the theories of teaching and learning, to the

process of designing, producing and refining the dissertation work, none of these could

be achieved without the help of those individuals whom I can only begin to thank with

the following words.

I am incredibly grateful to my advisor, Professor Azita Manouchehri. Dr.

Manouchehri first introduced me to the Young Scholars Program, which sparked my

interest in students’ mathematical reasoning. Since then, she has invested much of her

time into my growth as a mathematics educator, including the development of my

teaching and research skills. Through our many conversations, she has provided

thoughtful feedback, helping me clearly and coherently express ideas and shed light on

patterns in my analysis. Always urging me to take one step further, Dr. Manouchehri has

truly helped me grow as a professional in the field.

I would also like to thank Professor Herb Clemens and Professor Patti Brosnan

for their contribution as members of my committee. I thank Dr. Clemens for his insight

on mathematics education from the perspective of a university mathematics professor and

for guiding me in obtaining my master’s degree in mathematics. I thank Dr. Brosnan,

who, along with Professor Diana Erchick, introduced me to the Mathematics Coaching

v
Program, providing the environment through which I was able to develop my

understanding of the education system and to connect with the schools whose students

form the basis of my dissertation study. I would also like to thank the mathematics

coaches of those schools, who helped facilitate the collection of the data, as well as the

students who participated in the study.

I also very much appreciate my parents and my wife, who with their love, support

and understanding, helped me preserve my physical and psychological well-being

throughout my study. I would finally like to thank my graduate student colleagues, with

whom I could, when needed, commiserate — but more often and better still, collaborate

and celebrate.

vi
Vita

June 2012 M.A., Education, The Ohio State University

March 2011 M.S., Mathematics, The Ohio State University

July 2008 B.S., Mathematics, Peking University, P. R. China

Publications

Liu, Y., Zhang, P., Brosnan, P., & Erchick, D. (2012). Examining the geometry items of
state standardized exams using the van Hiele model: Test content and student
achievement. Research in Education, Assessment, and Learning, 3(1), 22-28.

Liu, Y., & Manouchehri, A. (2012). What kinds of arguments do eighth graders prefer:
Preliminary results from an exploratory study. In Proceedings of the 34th Annual
Conference of the North American Chapter of the International Group for the
Psychology of Mathematics Education. Kalamazoo, MI: Western Michigan
University.

Liu, Y., & Manouchehri, A. (2012). Nurturing high school students’ understanding of
proof as a convincing way of reasoning: Results from an exploratory study. In
Proceedings of the 12th International Congress on Mathematics Education (pp.
2848-2857). Seoul, Korea.

Manouchehri, A., Zhang, P., & Liu, Y. (2012). Forces hindering development of
mathematical problem solving among school children. In Proceedings of the 12th
International Congress on Mathematics Education (pp. 2974-2983). Seoul,
Korea.

vii
Liu, Y., Harrison, R., & Zollinger, S. (2011). Enhancing K-8 mathematics coaches’
knowledge for teaching probability. In T. Lamberg & Weist, L. (Eds.),
Proceedings of the 33rd annual meeting of the North American Chapter of the
International Group for the Psychology of Mathematics Education. Reno, NV:
University of Nevada, Reno.

Liu, Y., Zhang, P., Brosnan, P., & Erchick, D. (2010). Examining the geometry content of
state standardized exams using the van Hiele model. In P. Brosnan, D. Erchick,
and Flevares, L. (Eds.), Proceedings of the 32nd annual meeting of the North
American Chapter of the International Group for the Psychology of Mathematics
Education (Vol. 6, pp. 616-624). Columbus, OH: The Ohio State University.

Zhang, P., Brosnan, P., Erchick, D., & Liu, Y. (2010). Analysis and inference to students’
approaches about development of problem-solving ability. In P. Brosnan, D.
Erchick, and Flevares, L. (Eds.), Proceedings of the 32nd annual meeting of the
North American Chapter of the International Group for the Psychology of
Mathematics Education (Vol. 6, pp. 823). Columbus, OH: The Ohio State
University.

Field of Study

Major Field: Education

(Mathematics Education)

viii
Table of Contents

Abstract .......................................................................................................................... ii

Dedication ......................................................................................................................iv

Acknowledgement ........................................................................................................... v

Vita ...............................................................................................................................vii

List of Tables .................................................................................................................xii

List of Figures ............................................................................................................... xv

Chapter 1. Introduction .................................................................................................... 1

The Role of Proof in Mathematics ........................................................................... 2

The Debates and Status of Proof Learning............................................................... 3

Educational Research about Proof Learning ............................................................ 7

Pilot Study Findings ................................................................................................ 9

Purpose of the Study ............................................................................................. 12

Overview of Research Methodology ..................................................................... 12

Significance of the Study ...................................................................................... 14

Chapter 2. Literature Review ......................................................................................... 15

The Nature of Mathematical Proof: A Philosophical Account ................................ 15

The Functions of Proof in the Study of Mathematics ............................................. 26

Existing Theories of Proof Learning...................................................................... 36

ix
Theoretical Framework ......................................................................................... 47

Chapter 3. Methodology ................................................................................................ 57

Mixed Method Designs ......................................................................................... 57

Procedure of the Study .......................................................................................... 59

Sample .................................................................................................................. 60

Survey Participants ............................................................................................... 61

Survey Instrument: Survey of Mathematical Reasoning ........................................ 62

Interview Participants ........................................................................................... 84

Interview Procedure .............................................................................................. 85

Data Analysis ........................................................................................................ 91

Chapter 4. Results........................................................................................................ 115

Findings from SMR ............................................................................................ 115

Findings from the Interviews .............................................................................. 150

Chapter 5. Conclusion ................................................................................................. 238

Overview of the Study ........................................................................................ 238

Summary of the Findings .................................................................................... 239

Contribution to the Literature .............................................................................. 246

Limitation of the Study ....................................................................................... 252

Reflection on Existing Theories .......................................................................... 253

Implication for Proof Teaching ............................................................................ 258

References ................................................................................................................... 261

Appendix A. Survey results: Pairwise comparisons of arguments in each problem ....... 271

Appendix B. Survey results: Comparison between subgroups of students .................... 280


x
Appendix C. Interview results: Pairwise comparison of the rankings of arguments in each

problem ................................................................................................................ 287

xi
List of Tables

Table 1. The alignment between functions of proof and learners’ purpose in conducting

proof.................................................................................................................34

Table 2. Outline of the procedure of the study ................................................................60

Table 3. Type of the arguments used in SMR .................................................................83

Table 4. Background information of the subjects............................................................85

Table 5. Overview of the interview process ....................................................................87

Table 6. Outline of data analysis process ........................................................................91

Table 7. Rankings of arguments provided by Allen ........................................................96

Table 8. Summary of comments made by Allen ........................................................... 103

Table 9. Table of codes ................................................................................................ 107

Table 10. Categories of comments made by Allen ........................................................ 109

Table 11. Summary of the most understandable, convincing, explanatory and appealing

arguments as evaluated by the participants in each problem ............................ 134

Table 12. Summary of the least understandable, convincing, explanatory and appealing

arguments as evaluated by the participants in each problem ............................ 135

Table 13. Summary of high and low rated arguments by type ....................................... 140

Table 14. Rankings of arguments provided by Abby .................................................... 152

Table 15. Summary of comments made by Abby ......................................................... 153

xii
Table 16. Categories of comments made by Abby ........................................................ 157

Table 17. Rankings of arguments provided by Alice..................................................... 160

Table 18. Summary of comments made by Alice .......................................................... 161

Table 19. Categories of comments made by Alice ........................................................ 164

Table 20. Rankings of arguments provided by Amy ..................................................... 168

Table 21. Summary of comments made by Amy .......................................................... 169

Table 22. Categories of comments made by Amy ......................................................... 173

Table 23. Rankings of arguments provided by Beth ..................................................... 177

Table 24. Summary of comments made by Beth........................................................... 178

Table 25. Categories of comments made by Beth ......................................................... 182

Table 26. Rankings of arguments provided by Betty .................................................... 186

Table 27. Summary of comments made by Betty ......................................................... 187

Table 28. Categories of comments made by Betty ........................................................ 189

Table 29. Rankings of arguments provided by Blake .................................................... 194

Table 30. Summary of comments made by Blake ......................................................... 195

Table 31. Categories of comments made by Blake ....................................................... 199

Table 32. Rankings of arguments provided by Brenda.................................................. 204

Table 33. Summary of comments made by Brenda ....................................................... 205

Table 34. Categories of comments made by Brenda ..................................................... 208

Table 35. Summary of the subjects’ argument rankings ................................................ 212

Table 36. Categories of comments made by all subjects ............................................... 215

Table 37. Summary of the subjects’ rationale in argument evaluation ........................... 220

Table 38. Similarities and differences in the subjects’ rationale of argument evaluation 228
xiii
Table 39. Pairwise comparisons: Participants’ ratings on whether the arguments in each

problem were understandable ......................................................................... 272

Table 40. Pairwise comparisons of survey results: Participants’ ratings on whether the

arguments in each problem were convincing ................................................... 274

Table 41. Pairwise comparisons of survey results: Participants’ ratings on whether the

arguments in each problem were explanatory.................................................. 276

Table 42. Pairwise comparisons of survey results: Participants’ ratings on whether the

arguments in each problem were appealing ..................................................... 278

Table 43. Survey results: Between school comparison ................................................. 281

Table 44. Survey results: Between gender comparison ................................................. 283

Table 45. The gender * school effect ............................................................................ 285

Table 46. Pairwise comparison of the rankings of arguments in each problem .............. 288

xiv
List of Figures

Figure 1. Balacheff’s (1988) classification of students’ proving schemes .......................38

Figure 2. Proof schemes and sub schemes (Sowder & Harel, 1998) ...............................39

Figure 3. The van Hiele Model (van Hiele, 1986) ..........................................................41

Figure 4. The broad maturation of proof structure (Tall et al, 2012) ...............................46

Figure 5. Evaluation of argument is based on understanding ..........................................48

Figure 6. Reading Comprehension of Geometry Proof (RCGP) Model (Yang & Lin,

2008) ................................................................................................................50

Figure 7. Framework to classify students’ comprehension of a mathematical argument .. 55

Figure 8. Survey of Mathematical Reasoning .................................................................64

Figure 9. The structure of SMR......................................................................................81

Figure 10. The additional problem used in interview ......................................................90

Figure 11. Illustration of Allen’s rationale for evaluating mathematical arguments ....... 112

Figure 12. The percentage of participants who considered each argument understandable

....................................................................................................................... 116

Figure 13. Distribution of the number of arguments indicated understandable by each

participant ...................................................................................................... 117

Figure 14. Distribution of the number of arguments indicated not understandable by each

participant ...................................................................................................... 118

xv
Figure 15. Illustration of how understandable the arguments were to the participants ... 120

Figure 16. Illustration of how convincing the arguments were to the participants ......... 122

Figure 17. Illustration of how explanatory the arguments were to the participants ........ 126

Figure 18. The percentage of participants who considered each argument the appealing

....................................................................................................................... 129

Figure 19. An example of the data transformation for within group ANOVA test ......... 130

Figure 20. Illustration of how appealing the arguments were to the participants ........... 131

Figure 21. Plots for variables on which the gender * school effect was significant ....... 147

Figure 22. Illustration of Abby’s rationale for evaluating mathematical arguments ....... 159

Figure 23. Illustration of Alice’s rationale for evaluating mathematical arguments ....... 167

Figure 24. Illustration of Amy’s rationale for evaluating mathematical arguments ........ 176

Figure 25. Illustration of Beth’s rationale for evaluating mathematical arguments ........ 185

Figure 26. Illustration of Betty’s rationale for evaluating mathematical arguments ....... 193

Figure 27. Illustration of Blake’s rationale for evaluating mathematical arguments ...... 202

Figure 28. Illustration of Brenda’s rationale for evaluating mathematical arguments .... 211

Figure 29. Factors that impacted the subjects’ conviction ............................................. 219

Figure 30. Factors that caused inconsistent evaluation of the same type of arguments .. 234

xvi
CHAPTER 1. INTRODUCTION

Proof, in everyday language, usually refers to evidence, explanations and

arguments that are used to verify the truth of a statement. For example, in judicial process,

testimonies from witness are usually adopted as proofs. In an election, one’s past career

achievement is usually used as a proof of his/her leadership capability. In sports, winning

games is often considered as proof of competence. In natural science, proofs come from

empirical evidence observed in nature or experiments. There is no absolute standard of

sufficiency at which evidence and arguments become proof, as a common foundation for

all sorts of discussion (Pruss, 2006). The conventions and regulations about what can be

used as a reliable source and what can be accepted as a valid argument are highly area-

dependent, even when the discussion is restricted within the study of mathematics (Baker,

2009; Tall, 1991; Thurston, 1995; Usiskin, 1980). Despite the absence of a fixed and

precise standard, mathematical proof, not an exception, also certifies the truth of a claim

in concrete mathematical context. However proof plays a more significant role in

mathematics than in other disciplines.

There is no other scientific or analytical discipline that uses proof as readily and

routinely as does mathematics. This is the device that makes theoretical

mathematics special: the tightly knit chain of reasoning, following strict logical

rules, that leads inexorably to a particular conclusion. It is proof that is our

1
device for establishing the absolute and irrevocable truth of statements in our

subject. This is the reason that we can depend on mathematics that was done by

Euclid 2300 years ago as readily as we believe in the mathematics that is done

today. No other discipline can make such an assertion (p. 1, Krantz, 2007).

The Role of Proof in Mathematics

Although the general idea of mathematical proof, i.e. deriving a new result from a

known result, remains unchanged for more than 2000 years, details about how such a

process can be formulated have been debated and modified by mathematicians

throughout the development of mathematics (Jaffe & Quinn, 1993). Primitive forms of

mathematics (before Euclid’s Elements) didn’t imply an awareness of the need for proofs

when verifying statements. Conclusions were drawn from empirical examinations of

shapes and numerical relationships. Mathematical proof, in a deductive sense, started to

be attributed to an explicit meaning in Euclid’s geometry, which has been widely

regarded as the prototype of how a mathematical system should look (Krantz, 2007).

Ever since the Elements, rules have been set to demand that a mathematical proof must

root in definitions and axioms and evolve following acceptable forms of deductions.

Despite the historical and ongoing debates about what can be used as definitions, axioms,

and deductions within the community, consensus exists among mathematicians that a

mathematical proof must be timeless, impersonal, rigid and dependable (Brabiner, 2009;

Davis, 1976; Krantz, 2007; Tall et al., 2012). It is such a pursuit that makes mathematics

a reliable tool that is widely applied in physics, engineering, economy, and many other

disciplines.
2
Traditional discussion of mathematical proof focused on its reliability in

determining the truth of a statement (Brown, 2008). Such a perspective places emphasis

on precise descriptions of the definitions and premises (axioms) and a rigorous layout of

steps of deductions to make sure proofs were presented as a delicate and complete

product. Carl F. Gauss supported this idea by comparing a mathematician to an architect,

who “didn’t leave up the scaffolding so that people could see how he constructed a

building” (cited in Krantz, 2007). David Hilbert had hoped for a rigorization of

mathematics into a comprehensive and self-contained axiomatic system before this

conjecture was proved unachievable (Gödel, 1931). The influential modern mathematics

book series published in the middle of last century, written by the Nicolas Bourbaki

group, strictly adheres to the doctrines of formal mathematics by offering stern axiomatic

structures and not including any pictorial or other forms of intuitional assistance in proofs.

The New Math curriculum extended such a style into the education of growing

individuals, expecting early exposure to a rigorous format could help integrate such

practices into students’ mathematical thinking (Hanna, 1983). The pursuit of formal proof

has influenced generations of mathematicians and has greatly advanced the community’s

understanding of mathematics. However, its limitations that relate to its educational

impact were also criticized by scholars (Freudenthal, 1973; Lakatos, 1976; Schoenfeld ,

1991; Tall, 1999).

The Debates and Status of Proof Learning

Lakatos (1976) wrote “… (In formal proof) all propositions are true and all

inferences valid. Mathematics is presented as an ever-increasing set of eternal,


3
immutable truth. Counterexamples, refutations, criticism cannot possibly enter. An

authoritarian air is secured for the subject … Deductivist style hides the struggle, hides

the adventure …” (p. 142). Hanna (2000b) also claimed that “a proof, valid as it might be

in terms of formal derivation, actually becomes both convincing and legitimate to a

mathematician only when it leads to real mathematical understanding” (p. 7). Krantz

(2007) expressed the same opinion, advocating “In mathematics, we are not simply after

the result. Our ultimate goal is understanding” (p. 32). Tall (1999) added to the

discussion by suggesting “formal proof is appropriate only for some, that some forms of

proof may be appropriate for more” (p. 1). De Villiers (1990, 2003) offered a framework

to describe the six functions of proof in mathematics, including verification, explanation,

systemization, discovery, communication and intellectual challenge. All these efforts tend

to reconceptualize proof as a human activity rather than a passive mechanical procedure.

Aside from theoretical challenges posed by researchers, the instruction of formal

proof has also faced difficulties in school practice, especially at the introductory levels.

Historically (and currently), in the US, a course on Euclidean geometry has served as the

main venue for the development of students’ skills in deductive reasoning with the

expectation that such skills would automatically transfer to other mathematical and non-

mathematical areas (González & Herbst, 2006; Herbst & Brach, 2006). This goal,

however, remains unfulfilled. “Research results on students’ conception of proof are

amazingly uniform; they show that most high school and college students don’t know

what a proof is nor what it is supposed to achieve” (Dreyfus, 1999, p. 94). It is

recognized that this failure might be due to the school treatment of topics in curriculum

and instruction. There is evidence that in many mathematics classrooms proofs and
4
proving process is taught as a procedural topic instead of a conceptual tool for reasoning

(Herbst & Brach, 2006; Reid, 2011). As a consequence, students tend to view proof as a

special “form” of producing written work (e.g. two-column proof) instead of a viable

vehicle for production of reliable explanations, or even means for understanding (Chazan,

1993; González & Herbst, 2006; Healy & Hoyles, 2000; Schoenfeld, 1988). Additionally,

there is evidence that an understanding of the role of mathematical proofs in establishing

validity of arguments remains underdeveloped at all grade levels (Chazan, 1993; Chazan

& Lueke, 2009; Harel & Sowder, 1998; Heinze & Reiss, 2009; Kuchemann & Hoyles,

2009; Mason, 2009; Waring, 2000; Weber, 2001; Schoenfeld, 1988). Furthermore, even if

a learner showed an awareness of and the ability to produce complete proofs in a certain

mathematical domain, such knowledge might not transfer to other topic areas, nor would

it automatically grow into an overarching understanding of the deductive system (Fawcett,

1938/1995; Freudenthal, 1971; Liu & Manouchehri, 2012; Reid, 2011). Therefore, calls

for shifting the focus of instruction from assimilating students into the tradition of

producing a rigorous mathematical format to helping them reason logically and

coherently about concrete contexts have been made. Following such a trend, recent

reform efforts on mathematics curriculum place less emphasis on the layout of proof

while paying more attention on nurturing students’ proof skills upon understanding of

specific topics throughout the grades (de Villiers, 1990, 2003; Hanna, 2000a, 2000b; Reid,

2011).

The Principles and Standards for School Mathematics (NCTM, 2000) published

by the National Council of Teachers of Mathematics recommended that the students’

ability to reason and produce proofs must be fostered at all levels of the mathematics
5
curriculum (Hanna, 2000a). According to the standards, K-12 mathematics education

should enable high school graduates to “recognize reasoning and proof as fundamental

aspects of mathematics, make and investigate mathematical conjectures, develop and

evaluate mathematical arguments and proofs, and select and use various types of

reasoning and methods of proof” (p. 56). Furthermore, there is an explicit statement that

suggests nurturing the proof capacity in a broader content area, addressing “reasoning

and proof cannot simply be taught in a single unit on logic, for example, or by “doing

proofs” in geometry” (p. 56).

The Common Core State Standards (CCSSO, 2010) also place tremendous

emphasis on the need to assist students in developing their proving skills. Among the 8

Standards for Mathematical Practice in CCSS, 4 (i.e. reasoning abstractly and

quantitatively, construct viable arguments and critique the reasoning of others, look for

and make use of structure, look for and express regularity in repeated reasoning) are

directly related to exploring, perceiving and systemizing logical relationship.

However, realizing goals proposed by various standard documents requires

significant modification to and enrichment of teaching materials as well as a shift in

traditional classroom culture. The key idea of the transformation is that elements and

properties of mathematics, as developed by mathematicians, should not be the sole

determinant of the curriculum and instruction. The nature of students’ thinking and

behavior when engaged in mathematical activities (including proof) must be fully

respected in the design and practice of teaching (Ball & Bass, 2000, 2003; Boero, 2007;

Dreyfus, 2006; Schoenfeld, 1988; Shulman, 1986). Therefore, in order to create

instructional and curricular models that nurture and promote students’ comprehension of
6
proof and their ability to produce mathematically complete arguments, an understanding

of the nature of students’ thinking in proof related activities must first be developed.

Educational Research about Proof Learning

Stylianides and Stylianides (2008a) identified three cohorts of scholarly

investigations focused on studying proofs in mathematics education research. The first

cohort seeks evidence that students possess the ability to use deductive reasoning in

constructing arguments and proofs, even at the early elementary grades. The second

cohort describes students’ common difficulties and mistakes in producing proofs across

the grade levels and content areas. The third cohort offers an account of pedagogical

factors that could facilitate students’ learning about proofs. Although these three cohorts

of studies, including both empirical reports and theoretical investigations, provide

insights into students’ analytics as well as challenges in learning proofs, suggesting

implications for practice, they do not posit a framework to capture the features of

students’ thinking when performing proof related tasks. Studies of students’ proof

schemes tend to close this gap by creating a framework that classifies different types of

proofs that students offer. Following previous scholars’ work, such as Bell (1976) and

Balacheff (1988, 1991), Harel & Sowder (1998) organized the types of proof students

may use in various content areas of mathematics and proposed a taxonomy of proof

schemes consisting of three main categories, i.e. “external,” “empirical,” & “analytical,”

each of which encompasses several subcategories.

Another body of work concerns the cognitive development of learners as they

achieve a more mature comprehension of mathematical proof. The van Hiele levels (van
7
Hiele, 1986) is one of the most well-known frameworks to outline the stages in the

development of geometric thinking. Shaughnessy’s (1992) four-stage micro model of the

development of stochastic reasoning was constructed in a similar manner but within a

different content area. Frameworks that address explicitly proof learning include the

proof levels (Waring, 2000), reading comprehension of geometry proof (Yang & Lin,

2008), and the broad maturation of proof structure (Tall et. al, 2012). Detailed account of

these frameworks will be offered in the next chapter.

Harel & Sowder (1998, 2007) observed that students could simultaneously hold

different proof schemes when working on different problems. Their model detects such a

difference but does not explain why such inconsistency might exist. The cognitive

development models can capture students’ progress in producing logical reasoning in a

certain mathematical field, but fail to describe why and how such a development may

emerge across content area differences. The categories, levels, and stages offered by

existing models are not precise enough to draw connection to students’ evaluation of the

arguments. Hence, little can be said about what kind of mathematical arguments students

find appealing, convincing, or explanatory since even arguments that are classified as the

same type can be judged quite differently among people and across the content areas.

Therefore, a more precise proof classification framework needs to be conceptualized so

to allow an inquiry on learners’ understanding of and preference for different arguments.

In order for the instruction to enable students to understand and appreciate proof as a

reliable way of reasoning (de Villiers, 2003; Fawcett, 1938/1995; Reid, 2011), learning

about ways to help students realize proof as a reasoning methodology is equally

important as teaching the skills of producing specific proofs. As Usiskin (1980) pointed
8
out, there are various ideas, methods and layouts of proofs in different branches of

mathematics. Therefore, investigations into the impact of content on students’ use and

judgment of different mathematical arguments deserve a critical position in the study of

student learning of proofs.

Pilot Study Findings

In order to investigate students’ production and evaluation of mathematical

arguments, a pilot study was conducted involving 41 secondary school students. The

participants were drawn from 19 different middle schools across the state of Ohio,

suggesting variety in both content and heuristics they may have had experienced at the

time of data collection. A Survey of Reasoning (SR) was designed and used to examine

the participants’ proving processes, simultaneously, in four different content areas as a

means to closely inspect the potential relationship between a problem’s content and proof

scheme that may have been elicited by it.

The SR consisted of four mathematics problems from four different branches of

mathematics (i.e. number theory, geometry, probability, and algebra). Each problem

consisted of several parts. First, the participants were presented with a conjecture and

were asked to determine whether they agreed with and were certain of the accuracy and

completeness of the statement. They were also asked to offer an explanation for their

choice and factors they considered when evaluating the statement. In the second part,

four arguments, each embodying a different proof scheme supporting or refuting the same

statement, were offered. The participants were asked to compare their own argument to

those given, and to decide whether they preferred any of the optional statements over
9
their own method. Lastly, they reported whether or not they considered each of the

optional arguments convincing as well as mathematically complete. We deliberately

chose the terms convincing and mathematically complete to evaluate students’ “two

conceptions of proof” (Healy & Hoyles, 2000), assuming that when judging the

convincingness of an argument the students might tend to rely on subjective perceptions

whereas when judging the mathematical completeness they might refer to an

understanding of existing mathematical conventions. All participants took the survey at

the same time and completed it within two hours.

Quantitative analysis of participants’ responses led to several findings, which are

included (for more details, refer to Liu & Manouchehri, 2012):

 The majority of students relied heavily on empirical proof schemes when

producing arguments to support validity of propositions.

 The proof schemes of each individual’s favorite arguments varied across the

four problems.

 The proof schemes of the most convincing argument indicated by each

individual also varied across the four problems.

 Considerable number of cases were observed where a student found an

argument convincing in one problem but labeled the argument with the same

proof scheme in a different context as not convincing.

 Neither how convincing an argument appeared nor whether it looked

mathematically complete solely determined students’ preference.

 The students didn’t necessarily persist on their own proof scheme when they

were asked to identify an argument as their favorite. If a student found a


10
given argument understandable and more persuasive, s/he noted preference

for it over his/her own argument, even when the two arguments represented

different proof schemes.

Results from the pilot study suggested that the students adopted and determined

their preferred reasoning schemes based on the concrete context of the problem instead of

following a broader uniform scheme. This implied that the transfer of proving skills from

one area (typically geometry) to other mathematical fields, as expected by current

curriculum design, didn’t automatically occur. When confronted with alternative

argument types students exhibited a tendency to favor those arguments involving

analytical reasoning, indicating a potentially productive pathway towards building

children’s proving capacity upon a wide range of mathematical contexts (Stylianides,

2007; Tall et al., 2012).

Due to the absence of qualitative data on individuals’ understanding of the

arguments, the pilot study could not explain why the participants had made certain

decisions or managed to maintain, simultaneously, preference for different proof schemes.

For instance, the pilot study categorized arguments based upon researchers’ interpretation

of the brief written explanations that the students had produced. This information was

insufficient to capture accurately what the students’ comprehension of the arguments

might have been. Without a clear understanding of students’ comprehension of the

arguments, it is impossible to identify the factors that shape students’ views of those

arguments. Therefore, the current research was conceptualized to extend the previous

work and to shed light on the processes and resources students draw from when judging

mathematical proofs.
11
Purpose of the Study

The purpose of the study is to investigate how students evaluate arguments in a

wide range of mathematical contexts. Data collection and analysis was guided by the

three research questions:

 Are there certain types of mathematical arguments that students found

convincing, exploratory and appealing?

 Are there common aspects and features of arguments that significantly impact

students’ judgment of the arguments? If yes, what are they?

 How does problem context impact students’ judgment of arguments?

It is believed that such an investigation can contribute to the literature on

individual decision making when evaluating mathematical arguments that can inform

curriculum and instruction. Drawing from Harel & Sowder’s (1998) proof scheme

taxonomy, Yang & Lin’s (2008) Reading Comprehension of Geometry Proof (RCGP)

model, Bruner’s (1966) synthesis on representation types, and Stylianides and

Stylianides’s (2008a) identification of three aspects of proof, the Classification Cube of

Internalized Arguments (CCIA), a theoretical framework was built to describe different

types of mathematical arguments (see Figure 7 in Chapter II for more details).

Overview of Research Methodology

Adopting a mixed methods design (Greene, Caracelli, and Graham, 1989), the

study consisted of the development, administration and analysis of a survey and follow

up interviews. The survey and interview protocol were designed and refined in 2012. The

12
revised survey (Survey of Mathematical Reasoning, SMR, see Figure 8) was administered

in January - February 2013, and the follow up interviews were conducted in April 2013.

The interested population of this study was 8th grade students. Two reasons

contributed to this choice. First, according to Piaget’s (1985) Intellectual Development

Stages, middle school students are at a critical cognitive phase where they can engage in

abstract and logical thinking. Therefore, how they learn to value different arguments at

this stage could potentially impact their reasoning skills and thinking habit in the later

years. Second, the grade band serves as a bridge between middle and high school

mathematics and the link between informal and more formal and abstract mathematical

reasoning. According to the curriculum standards (CCSSO, 2010), most 8th grade

students should have obtained basic understanding of numbers, shapes, chance, and

algebraic expressions, know some simple propositions and properties, and be able to see

the connection between concepts and ideas. However, they may not have yet adopted

abstract thinking or deductive ways of mathematical reasoning using conventional

proving techniques and forms. Therefore, the features of arguments they consider as

convincing, explanatory, and appealing can offer valuable references for the development

of resources and instructional explanations that can facilitate students’ internalization and

adoption of more mathematically rigid argumentations.

Data collection followed two phases. During the first phase, over 500 8th grade

students from 5 different public schools in Ohio took the SMR. The students’ responses

were then analyzed quantitatively to investigate their evaluation of the arguments used in

the SMR. In particular, the goal was to identify the type of arguments that they found

understandable, convincing, explanatory or appealing. During the second phase, eight


13
subjects, who had exhibited different patterns in their survey choices, were selected and

interviewed. Common factors that impacted each subject’s evaluation were summarized

and the individual differences were investigated through between subject contrasts.

Details about the participants of the study, development of survey instrument, procedures

of the interview, as well as the data analysis process are described in Chapter III.

Significance of the Study

This study had the potential to advance the understanding about proof learning on

three levels. First, empirical studies about the middle school students’ evaluation of

different proof types have been rare. Second, investigations that seek to identify

consistent features across content areas that individuals might consider when evaluating

mathematical arguments have been prominently absent from the literature. Lastly, the

type of studies that have worked towards developing a framework useful for identifying

the type of argumentations most likely to be assimilated by students, as their own

reasoning methodology was underdeveloped. The current study aims to make novel

contributions to each of the four needed areas.

14
CHAPTER 2. LITERATURE REVIEW

This review offers a summary of literature on the nature and functions of proof in

mathematics. Findings of published studies on students’ proving and reasoning, and

existing theoretical frameworks concerning proof learning are discussed. An overview of

theoretical framework guiding the current study is offered.

The Nature of Mathematical Proof: A Philosophical Account

Descartes claimed that a mathematical proposition must be “deduced from true

and known principles by the continuous and uninterrupted action of a mind that has a

clear vision of each step in the process” (cited in Baker, 2009, p. 1). Krantz (2007)

described mathematics as “(i) coming up with new ideas and (ii) validating those ideas by

way of proof” (p. 33). Despite such emphasis, the precise nature and role of mathematical

proof has long been debated by mathematicians and mathematical educators (de Villiers,

1990, 1998; Hanna, 2000b; Lakatos, 1976; Krantz, 2007; Tall, 2002). Because of the

centrality of proofs and proving in mathematics, discussions surrounding its nature has

resided at the heart of the philosophy of mathematics. In this section I will offer a review

of three prominent perspectives (i.e. Platonism, Formalism and Constructivism), whose

philosophical standpoints offer different, if not contradictory, accounts of the role and

function of proof in mathematics and hence provide distinct educational implications for

proof instruction.

15
Platonism

Platonism is a school of philosophy whose principles and perspective have not

been limited to mathematics alone. Philosophers and mathematicians maintain different

interpretations when applying Platonism in the domain of mathematics, nevertheless they

generally agree that Platonism in mathematics (in its pure sense) views mathematical

objects as abstract yet eternal and unchanging (Armstrong, 1970; Balaguer, 2008). For

example, when considering the “fact” that 1 + 1 = 2, those espousing Platonism believe

in the existence of exact concepts of 1, 2, + and = in an abstract world called mathematics,

and consider their relationship demonstrated by the equation an eternal truth without the

involvement of human beings. Therefore, the mission of mathematicians from this

paradigm is to explore and discover the unknown underlying truth (Weir, 2011).

According to Platonism, axioms must describe absolute and eternal truths of the world,

and proof is a method to find other absolute and eternal truths determined by the axioms.

However, Platonists were “rapidly losing support” (Weir, 2011) with the

development of more recent perspectives in mathematical philosophy. Among them,

formalism and constructivism have gained considerable attention and raise extensive

debates among mathematicians.

Formalism

Formalism was developed upon logicism, which views mathematics as a

systematic structure built upon axioms following certain rules. Carnap (1937) suggested

two logistic principles of the mathematical system:

16
 The concepts of mathematics can be derived from logical concepts through

explicit definitions.

 The theorems of mathematics can be derived from logical axioms through

purely logical deduction.

These suggest that a mathematical axiomatic system must start with finitely many

statements (namely axioms/postulates) that are assumed to be true, and that the judgment

of validity of other statements in that axiomatic system must be based on deduction from

them. With a deeper and abstract conception of the deductive procedure, mathematicians

attempt to convert mathematics into a symbolic system using Set Theory (Johnson, 1972)

by restricting what procedure could be considered as logically valid (i.e. what deduction

is) and what concept is usable in deduction (i.e. what a set is). David Hilbert, arguably the

most prominent mathematician of the formalist genre, set the groundwork for

operationalizing this view.

… he (Hilbert) adopted an instrumentalist stance with respect to higher

mathematics. He though that higher mathematics is no more than a formal game.

The statements of higher-order mathematics are uninterpreted strings of symbols.

Proving such statements is no more than a game in which symbols are

manipulated according to fixed rules (Weir, 2011).

Formalism agrees with Platonism by envisioning an ideal and static system within

which truth and falseness are indisputable. As such, formalism has a strong tendency to

eliminate the impact of human perception since the criteria allowed in mathematical

deduction are impersonal. However, formalism only concerns the validity of statement

within the established system without articulating how truth within the system relates to
17
truth enclosed in nature (Weir, 2011). For instance, a + b = b + a holds in a commutative

group, but a formalist is not at all interested in how “a”, “b” and “+” in the equation relate

to quantities and operations in real life. In other words, “truth” as viewed by a formalist is

a relative “truth,” which is “attached” to the validity of the axioms (assumptions). Hence

it is different from the absolute “truth” pursued by Platonists. Besides, “truth” in

formalism lies in a very restrictive and artificial setup of mathematical system that is

designed by mathematicians, which also seems to be opposite to Platonists’ belief that

mathematics should be discovered (Ernest, 1996).

Gödel’s incompleteness theorems (Gödel, 1931) directly rejected the possibility of

establishing a “perfect” system, i.e. a single, recursively axiomatizable formal system

which is consistent, purely deductive and complete (Baker, 2009), as envisioned by

formalism. In particular, the first incompleteness theorem claimed that if the finitely many

axioms in the system are not contradictory to each other, then there must be a statement

within the system whose validity cannot be determined by the axioms and deductive

results upon them. The second incompleteness theorem enhanced the first theorem by

claiming that even if an axiomatic system includes an axiom to confirm the system’s own

consistency, then the existence of contradiction (inconsistency among the axioms) will

become inevitable.

The impact of Gödel’s incompleteness theorems on formalism can be viewed in

two manners. First, the perfection of the axiomatic system envisioned by formalists is

totally denied. No matter how deliberate the axioms may be set up, the deductive system

won’t be able to solve all the problems within the system. In this sense, the methodology

designed by formalists by itself is “incomplete.” Second, the incompleteness theorems


18
don’t deny the value of proof where proof is possible. In other words, the incompleteness

theorems don’t demolish the value of the deductive system, in the sense that the system is

never perfect but functional and powerful in a very broad context. In fact, formalism

largely advanced the comprehension of many mathematical subjects. For instance,

because of the establishment of measure theory, mathematicians gained insights on

distance, area and volume. To date, the discovery and proof of new theorems is widely

regarded as the highest level of mathematical research (Tall et al., 2012).

Constructivism

Constructivism, radical or social, has largely been adopted in the social sciences,

and the study of mathematics education is not an exception (von Glasersfeld, 1994;

Vygotsky, 1978). However, the term mathematical constructivism usually refers to a

perspective that may not be familiar to educational researchers.

Mathematical constructivism refers to a group of studies about the structure of

mathematical systems, such as intuitionism introduced by Brouwer (1905) and finitism

developed by Hilbert and Bernays (1934/1939). Generally speaking, a starting point in

the mathematical constructivism process is that a mathematical object must be found

(constructed) to prove its existence. Despite recognizing that the foundations of

mathematics lie in human intuition (Troelstra, 1977), the theory of mathematical

constructivism does not take any particular individual’s subjective opinion into

consideration. Instead, the theory makes assumptions about what types of deductions

people intuitionally and commonly accept and redefines a new type of logic such that it is

no longer based on the classical Set Theory. For example, the intuitionistic logic is
19
actually a revision of the classic Set Theory logic by removing the law of the excluded

middle (i.e. either a statement is true or its negation is).

Despite the importance of mathematical constructivism in the development of

mathematical theories, another interpretation of constructivism in mathematics (i.e.

mathematical concepts and axiomatic systems) built upon human intuition, representation

and communication, is of greater concern in education studies 1.

In such a perspective, mathematics is a product created by human intelligence

instead of a pre-existing, ideal and perfect form that people attempt to discover.

Constructivism suggests that mathematical concepts do not exist beyond human

understanding (opposing Platonists’ view). Rather “concepts and structures are the result

of a cognitive/historical knowledge process. These originate from our action in space

(and time) and are further extended, by language and logic” (p. 22, Longo, 2009).

Therefore, mathematical concepts are not static. They may change as the result of the

discovery of new cases or when new needs emerge in the community. Different from

formalism, constructivism doesn’t admit or pursue the “best” form of mathematics. It is

believed that mathematical theories and methodologies can always be improved to

function in a broader context or in an innovative way (Ernest, 1996).

1
This interpretation is consistent with the concept of constructivism that is widely used in social science research. Since

ultimately this study concerns how an individual can develop deductive reasoning skills, human experience and

activities in conceptualizing and establishing mathematical ideas and structures offer valuable references. Hence for

convenience, constructivism in later text of this dissertation will all refer to the second interpretation unless specially

explained.

20
Lakatos (1976) offered remarkable examples to illustrate such a perspective. In

Proofs and Refutations, Lakatos recreated an imaginary classroom scenario where the

students were exploring the validity of Euler’s polyhedron formula (F + V – E = 2).

During the process, students found that their conception about what a face, an edge and a

vertex might be was substantially less rigorous than they had realized. Consequently the

students engaged in a discussion about the precise definition of those concepts (e.g.

vertex, edge, face, simple polyhedron, etc.). Students challenged each other’s definitions

using specific counter examples to demonstrate the incompleteness of the defined terms

and propositions attached to them. As a consequence, their understanding of the object

(polyhedron) was refined and deepened gradually in such a “proof and refutation”

process. Lakatos suggested that concepts, as the foundation of mathematical systems, are

constructed instead of discovered by human intelligence. It was not predetermined what

should be called an edge of a polyhedron. Instead, the concept was constructed by

humans and refined to fit into a more useful theory. Neither was it predetermined that

there must be Euler’s polyhedron formula in the theory. Indeed this property rests upon

existing definitions. However, there was no guarantee that it must serve as an important

theorem or proposition. The treatment of “local” and “global” counterexamples (Lakatos,

1976) may lead the development of theory towards another direction (referring to

treatment of the Parallel Postulate in different geometry systems). Furthermore, it was

not predetermined what types of deduction should be allowed in mathematical reasoning.

For instance, visual aid is used to prove the Pythagoras Theorem in Euclidean Geometry,

however such an approach isn’t considered as reliable in Real Analysis. Intuitionism even

rules out the method of proof by contradiction. Therefore, constructivism suggests that
21
concepts, axioms, propositions and proving methodologies in mathematics are all

inventions of mathematicians. The scenarios Lakatos presented in his imaginary

classroom depicted the journey which mathematicians took to establish the current

enterprise of mathematics, advocating the need for absolute respect for the natural

development of knowledge and implementation of a heuristic instructional methodology.

Notice that constructivism doesn’t deny the reliability of an already constructed

system, which is consistent with a formalist’s view. In fact, constructivism may suggest

that many of the axiomatic systems built by formalists are the best models that

mathematicians have produced to date. However, constructivism approves arguments to

be used in understanding mathematical objects and propositions which may not be

accepted by a formalist in many cases. Taking the proof of Jensen’s inequality (the simple

version) as an example, from the standpoint of constructivism, a proof using visual

implication is totally acceptable; but to a formalist, a rigid algebraic approach is

definitely more complete and accurate. Constructivism approves the reliability of visual

implication since it is valid and efficient in many situations (e.g. elementary Euclidean

geometry, graphs of low degree polynomials, etc.), therefore judgment with visual aid is

reliable when dealing with cases within a certain scope, even though this method may not

apply to a broader context. However, formalism may degrade the reliability of adopting

visual implication to the specific proof since it cannot deal with Dirichlet function. In

other words, formalism suggests a method to be superior to another when it applies to a

broader context, which won’t be influenced by any particular problem solver’s own

experience and judgment of the scope of discussion.

22
Summary

There is consensus within the mathematics community that proof must start with

facts that are known or assumed to be true, use if-then logic (regardless of whether the

logic must be defined upon Set Theory), and then truth or falseness (either in a relative or

absolute sense) of the targeted statements should be established (Harel & Sowder, 1998;

Krantz, 2007; Tall et al., 2012). Platonism suggests that proof discovers and verifies truth

based on known truth, while formalism and constructivism suggest that proof starts with

assumptions and follow a sequence of steps of deduction to achieve a judgment of the

validity of a proposed statement. However, a closer look reveals four major differences

between the latter two perspectives.

 Ideal vs. Instrumental. Formalism pursues an ultimate form of proof that

satisfies certain criteria which guarantee the validity of proof, despite the

individual’s perception. Constructivism stands with instrumentalism, in the

sense that concepts, axioms and proofs are all invented to solve problems and

upgraded to solve more complex problems in a wider range.

 Static vs. Dynamic. Formalism holds a narrower and more static view of

what can be viewed as assumptions and what kind of deductive steps should

be allowed in a proof. It has a more restrictive and fixed standard of what a

proof in mathematics should be like. Constructivism suggests that the

concepts, assumptions and proofs are constructed instead of discovered by

mathematicians. Hence they evolve with the rise of new questions and

discovery of new areas.

23
 Restrictive vs. Open. Formalism tends to deny or degrade the validity of

certain proof schemes once a “better” approach is found. Constructivism also

agrees that there could be proofs that are reliable in a broader context,

however the “best” proof mathematicians could come up with is not the only

acceptable form of proof and cannot replace the role and value of less rigid

arguments.

 Global vs. Local. Formalism suggests that when proving the validity of a

statement, people should be adequately familiar with related theories so they

know the scope in which the discussion lies and use methods within that

scope. Constructivism suggests that when proving a certain statement, people

draw from their own experience and community to determine the scope of

discussion and reliable methods.

Because of these different standpoints, constructivism and formalism have distinct

implication for mathematics education, especially at the introductory levels (Hersh, 2009;

Lakatos, 1976). Formalism, with an emphasis on presenting mathematics in its most

complete and rigid form, tends to introduce a theory from a careful layout of its basic

components, i.e. definitions and axioms, followed by deductive processes to build

knowledge upon the foundation. Constructivism suggests guiding the learners through the

journey that previous mathematicians took to establish theories, i.e. first offering

premature and informal perception of the subject and then refining and formalizing the

understanding through problem solving and critical reflections. These two instructional

methods are referred to by Lakatos (1976) as Deductivist and Heuristic approaches,

respectively. Currently, textbooks written in a manner of formalism dominate the


24
advanced field (college and beyond), while constructivism has gained much support at

elementary, secondary and early college level, particularly following the decline of New

Math era (Hanna, 2000a; Tall, 1991).

Note that in reality an individual’s philosophic perspective may lie somewhere in

between and shift depending on specific situations. In addition, there are different

terminologies used by scholars (e.g. Absolutist vs. Fallibilist (Ernest, 1996)) to describe

philosophical views of mathematics, and these different classifications maintain their own

criteria in organizing their respective perspectives. After all, Platonism, formalism,

constructivism and various other terms are perceptual concepts instead of defined

concepts (Bruner, 1987). Nevertheless, the essential purpose of the comparison is not to

determine how the three philosophical perspectives differ, but to help identify and

describe the criteria and features that a mathematical proof may possesses. From the

standpoint of a mathematics educator, the subject of study is learners with evolving views

of mathematical proofs, and the most relevant philosophical perspective is a school of

thought that respects an individual’s development of knowledge. Therefore,

constructivism serves as the basis for theoretical models in the learning of proof as well

as other fields in mathematics education research (Balacheff, 1991; de Villiers, 2012; Tall,

2009; van Hiele, 1986). This study is not an exception.

Constructivism recognizes proof as a human-involved activity rather than a

mechanical procedure. It shifts the attention from the content to learners’ thinking and

behavior. This implies that the content itself no longer solely determines how it should be

taught, rather learners’ “natural” behavior in the learning process must also be considered

as a fundamental component to guide instruction. As such, the purpose of teaching proof


25
is not to acclimate learners to a certain type of argument structure whose format must be

strictly followed; rather, the instruction should guide students to develop a personal

meaning of proof, in particular why proof is needed and what features it should possess to

meet the need (Hersh, 2009). Such a focus calls for investigations into two critical

questions:

 What does mathematical proofs mean to a learner?

 What is the nature of a learner’s thinking when developing an understanding

of and skills in constructing mathematical proofs?

The following two sections are devoted to reviewing and summarizing the

research studies responding to these questions.

The Functions of Proof in the Study of Mathematics

Proof as a methodology to verify the correctness of a mathematical statement has

served as a primary function in mathematics ever since it started to be used (Krantz, 2007;

Tall et al., 2012). Without understanding the concept of proof, it is impossible to perceive

what mathematical theory and practice might mean (Hanna, 2000b). With the prevailing

adoption of formalism in mathematics, proof in the deductive form came to be viewed as

the only acceptable way to establish arguments in mathematics, granting it a supreme

importance to the subject. More recently, with the rising attention to the perspective of

constructivism, additional functions of proof in mathematics were being studied and

synthesized (Tall, 1999).

Bell (1976) described the functions of proof as verification, illumination and

systematization. Balacheff (1991) suggested that “... a mathematical proof is a tool for
26
mathematicians for both establishing the validity of some statement, as well as a tool for

communication with other mathematicians” (p. 178). Schoenfeld (1994) claimed that “it

(proof) is an essential component of doing, communicating, and recording mathematics”

(p.76). Reflecting on existing literature and his own experiences with teaching and

learning of mathematics, de Villiers (1990, 2003) outlined six major functions of proof in

the discipline (p. 18):

 Verification (concerned with the truth of a statement),

 Explanation (providing insight into why a statement is true),

 Systematization 2 (the organization of various results into an organized system

of concepts, axioms, theorems and propositions),

 Discovery (the discovery or invention of new results),

 Communication (the transmission of mathematical knowledge), and

 Intellectual challenge (the self-realization/fulfillment derived from

constructing a proof).

Each of these functions is described in greater details below.

Verification

Verification is notably the most recognized function of proof. If a statement is

proved to be true without errors, then its correctness must have been clarified and there is

no space for counterexamples. In the mathematics community, a statement’s validity

remains unclear until it is proved. Until then, the statement can only be regarded as a

2
“systemization” is the original spelling used by de Villiers.

27
hypothesis even if it seems true by mathematical authorities and no counterexample is

found. Although de Villiers also pointed out that “proof is not necessarily a prerequisite

for conviction – to the contrary, conviction is probably far more frequently a prerequisite

for the finding of a proof” (p. 18), the level of conviction that mathematicians acquire

before and after obtaining a proof can never be denied. There is a clear difference

between “pretty sure” and “absolutely sure.” In fact, there were many occasions in the

history of mathematics where widely-conjectured-to-be-true statements were later

proved to be incorrect. One famous example could be the Kakeya needle problem, which

was proposed by Kakeya (1917) attempting to find the minimal area in 2-dimensional

Euclidean space within which a unit line segment can be rotated continuously through

180 degrees. Many mathematicians (including Kakeya himself) seemed to believe that

the deltoid would be the solution, since deltoid is composed by such elegant curves that

seem to satisfy conditions that are so crucial to obtain the “minimum.” Much effort had

been done to prove that deltoid is the correct solution until the Besicovitch set, a much

more complex and “artificial” shape, was proved to be the right response to the problem

(Besicovitch 1919; Pal, 1920). Another famous example is Leonhard Euler’s conjecture

that there were no integers x, y, z, and w such that x4 + y4 + z4 = w4, a counterexample to

which was found after almost 200 years (cited by IAS/PCMI, 2007). Since there is no

guarantee that a seemly true statement according to people’s or even experts’ intuition

might hold true in mathematics, it is proofs that distinguish true results from seemingly

plausible, but are not generally true, statements (Grabiner, 2012).

28
Explanation

Imagine the following scenario.

A teacher asked a student to write a 4-digit whole number on the blackboard, then

the teacher immediately said “it is divisible by 3.” The student checked with a

calculator and found out the teacher was correct. Then the student challenged the

teacher again with even larger whole numbers and the teacher could make a

correct and prompt judgment every time.

If you were the student, what would you want to know? Most likely, with the

assumption of the existence of curiosity, you would want to know “why.”

As Polya (1954) stated, “ … having verified the theorem in several particular

cases, we gathered strong inductive evidence for it … When you have satisfied yourself

that the theorem is true, you start proving it” (p. 83-84). This leads to the second function

of proof in mathematics, explanation.

As illustrated in the imaginary scenario, knowing the teacher was correct didn’t

satisfy the students’ curiosity (actually it was more likely to be aroused), and didn’t

advance the students’ understanding of the subject (Hanna, 2000b). De Villiers suggested

(1990) that merely verification “gives no psychological satisfactory sense of illumination

– no insight or understanding into how the conjecture is the consequence of other

familiar results” (p. 19).

“Proof helps us understand and explain mathematics” (p. 16, Reid, 2011). Proof

connects phenomena with more basic rules (theorems and axioms) that seem obvious and

unchallengeable. It unpacks the relationship between an unfamiliar proposition and

familiar results. Therefore, each truth statement is no longer an isolated piece of


29
knowledge, but is supported by more fundamental understanding of the subject. Realizing

and intentionally pursuing the structure of knowledge built by proof leads to a higher

order function of proof in the study of mathematics.

Systemization

Each proof uncovers a segment of an axiomatic system, allowing people to see a

twig of the structure. It ultimately leads to the understanding of the structure. It is an

indispensable tool for systematizing known results into a deductive axiomatic system (de

Villiers, 1990).

Systemization requires higher levels of thinking than merely producing a proof.

Even those who can generate a proof for some complex propositions (e.g. the Nine-Point

Circle) in Euclidean Geometry may not be conscious about the five (or ten, if counting

the assumptions about algebra) basic assumptions or be aware of how the axioms and

theorems work together as a system. This is because: 1) proving a particular statement

may only involve understanding of a small part of the system; and 2) accepting if-then

logic, as needed in generating a proof, does not necessarily require a global

understanding of if-then logic’s role in a deductive system. Therefore, there is a

difference between knowing a statement is true and knowing a proof is performed

correctly within a system. Hence, a major function of proof is to lead to the

understanding of mathematics as an organized and logically consistent network.

Nevertheless, it is impossible to understand the system as a whole without

perceiving proof as a local illustration of how the system works. It is commonly believed

that an overarching understanding comes after adequate local experiences (e.g. the van
30
Hiele model, 1986). Additionally, proof and investigation of a single case may also

inspire or directly cause a more insightful or even revolutionary view about the

knowledge structure (Lakatos, 1976). For example, the proof of Chinese Reminder

Theorem inspires the understanding of ideals in ring theory; Russell’s paradox (1903)

caused a reconsideration and reconstruction of the logic system. This leads to the

discussion of the next function of proof.

Discovery

Many discoveries in mathematics might be initially obtained or inspired by

empirical investigation (e.g. the law of large numbers) and guess and trials (e.g the four

color map conjecture/theorem). There are also new results that “were discovered or

invented in a purely deductive manner” (de Villiers, 1990). Reid (2011) illustrates this

point by suggesting that a proof of the statement that “the sum of two consecutive odd

numbers is even” leads to the discovery of a new fact that the sum must be a multiple of 4.

Another example could be Euler’s polyhedron formula, which largely narrows down the

possible cases of regular polyhedrons and directly implies the discovery of all the

possible cases.

Perhaps the best presentations of proof’s discovery function lie in the natural

science studies, where many phenomena were “found” in the theory before being

discovered in reality. This is among the most important reasons for mathematics to be

such a popular tool in those disciplines. A well-known example is the discovery of the

gravitational lens (bending of light by mass), which was deduced from Einstein’s general

theory of relativity before it was confirmed by observation. Another famous example is


31
about the molecule of C60, whose geometric structure was conjectured before its physical

appearance was found. Discovery in natural science by proof has a strong implication for

Platonism, in the sense that there are perceivable and predictable orders pre-existing in

the world before being “discovered” by human intelligence.

Discovery by proof also seems to be a natural outcome of systematization. When

talking about the verification and explanation function of proof, we consider the process

of tracing the statement “down” to the axioms and theorems. However, neither an

individual nor a community can examine or discover all true statements within a system,

especially when the system is newly established. Hence, when building “up” the theory

upon the axioms, theorems and other known results, it is quite possible to encounter

statements that have never been studied before.

Communication

While the discovery function of proof is somewhat compatible with the

perspective of Platonism, the communication function is more likely to share the basis of

constructivism. In the view of constructivism, mathematics is regarded as a product of

social construction (Balacheff, 1991). It is described as “careful reasoning leading to

definite, reliable conclusions” (Hersh, 2009). The communication delivered by

mathematical proof has two major features: clear definition and rigid layout of causal

relationships. In particular, mathematical concepts, while initially extracted from reality,

need to be understandable and communicable while carrying minimal intuitional

confusions (e.g. quantity is a mathematical concept but “prettiness” probably is not). In

addition, communication by proof also serves to maintain the least level of intuitional
32
confusion in the reasoning process. However, it is impossible to radically remove

intuition from proof. Description of the concepts involves external or even non-

mathematical language (Krantz, 2007). When performing perhaps the most rigorous

standard of deduction as theorized by the Set Theory, intuition still plays a part when

visualizing the inclusion and exclusion relationships. Nevertheless, there are intuitions

that seem to be accepted by all human beings, and hence they are used to form a common

ground for critical debates (Davis, 1976). Mathematical activities ultimately pursue

commonly accepted facts and perform commonly accepted reasoning. Proof, loyal to

both aspects, serves as the most explicit and reliable tool in communicating the substance

of mathematical thinking.

Intellectual challenge

According to de Villiers (1990), proof also serves the function of self-realization

and fulfillment. The motivation of doing proofs may come from the desire to conduct

more elegant proof or the satisfaction of conquering difficult challenges. Although

mathematical proof is set to pursue a common ground and a generally accepted way to

present causal relationships, those who can actually understand, appreciate and utilize the

idea and structure of mathematical proof only compose a small portion of the population.

The pursuit of certainty and accuracy in mathematical proof as a discipline is appealing to

scholars and learners of mathematics, science, philosophy and other logic intensive

disciplines.

33
Implication of de Villiers’ model to the learning of proof

De Villiers’ description (1990, 2003) of the functions of proofs in the study of

mathematics and its educational implications are of great importance to the mathematics

education community. De Villiers suggested that proof shouldn’t be taught only as a

method to verify mathematical statements, but also as a way to explain, organize,

investigate and communicate about mathematical facts. Students’ motivation in learning

proof can be stimulated by the curiosity of knowing “if something is true,” as well as the

willingness to know “why something is true,” “how things relate to each other,” “what

else may be true,” and “how to let other people know my ideas.” The functions proposed

by de Villiers are well aligned with learners’ interests brought into the context (see Table

1).

De Villiers’ Learners’ purpose in conducting proof aligned with the


classification of the function of proof
functions of proof
Verification To know if a statement is true
Explanation To know why a statement is true/false
Systemization To know how concepts and properties are related to each
other
Discovery To know what else is true/false
Communication To know how to communicate mathematical ideas with
other people
Intellectual Challenge To know how good s/he is in mathematical reasoning

Table 1. The alignment between functions of proof and learners’ purpose in conducting
proof

34
Healy & Hoyles (2000) categorized students’ view of proof and its purposes in a

large scale empirical study of children aged 14-15. They found that 28% of the students

didn’t show any understanding of the purpose of proof. In addition, only 1% of the

students acknowledged that proof might help discover new theories or systemize ideas.

The most recognized functions of proof were verification and explanation 3. Furthermore,

Healy & Hoyles posited that students’ understanding of the purposes of proof had a

significant influence on their ability to identify and construct a proof. From the

perspective of constructivism, only when students consider a proof as convincing and

explanatory they become more likely to assimilate it into their own reasoning method.

Without an understanding of students’ intention in producing and judging mathematical

arguments, it is impossible to understand why certain decisions are made. Therefore, the

power of validating and explaining is adopted in this study to depict participants’

evaluation of mathematical arguments in order to understand what features of the

arguments they value.

Certainly, identifying the possible intentions only serves as a starting point toward

understanding learners’ behaviors when engaged in proof related activities. Even if a

student clearly demonstrates an intention to verify a mathematical statement, researchers

might still not know why a certain strategy (algebraic vs. geometric, empirical checking

vs. deductive reasoning, etc.) was valued by the student. They might not know why the

student failed or succeeded in achieving his/her goal either. Hence, in order to obtain a

deeper understanding of how students’ knowledge of proof is constructed, investigations

3
In Healy & Hoyles’ (2000) classification of students’ view of the purposes of proof, the category named “explanation”
included both explanation and communication as identified in de Villiers’ (1990, 2003) model.
35
are needed to explore not only how students perform on tasks that demand proof, but also

their thinking which can be approached by carefully designed questions. Studies that

attempted these issues are discussed in the following section.

Existing Theories of Proof Learning

Stylianides and Stylianides (2008a) distinguished, within the genre of educational

research on proof, three major categories. The first category includes studies that

investigate students’ ability to perform proof related activity (Ball & Bass, 2003; Lampert,

1992; Marrades, & Gutiérrez, 2000; Reid, 2002; Sekiguchi, 1991; Zack, 1997). This body

of work suggests that students naturally possess the ability (Piaget, 1928, 1987) to reason

even at early elementary grades (Zack, 1997). They call for the design of interventions

that encourage students to reason coherently instead of assuming they are not ready and

providing them cognitively soft tasks to do (Bloom, 1984; Usiskin, 1987). The second

category of studies describe students’ common difficulties and mistakes when producing

proofs across different grades and content areas (Balacheff, 1988; Chazan, 1993;

Schoenfeld, 1988; Senk, 1985). The third category elaborates on the pedagogical factors

that could facilitate students’ learning about proofs (Hoyles, 1997). These three

categories of studies, including both reports of empirical investigations as well as

theoretical essays, offer insights into students’ ability along with challenges they

experience when learning proofs (Pirie, 1988). However, collectively, they fail to provide

a systematic and panoramic framework to capture features of students’ thinking when

performing proof related tasks. This gap inspired a body of studies on learners’ proof

schemes.
36
Proof schemes

The study of learners’ proof schemes has a long history and is currently a main

stream in didactics of mathematics. For instance, Bell (1976) identified “Empirical” and

“Deductive” as two major modes of justifications that students used when working on

problems that demanded proving. Empirical justification, according to his description,

relies on the use of examples whereas deductive justification relies on deduction to

connect data with conclusions.

Balacheff (1988) coined “pragmatic” and “conceptual” as two prominent modes

of justification used by students. Pragmatic justifications are based on the use of

examples (or on actions), and conceptual justifications are based on abstract formulations

of properties and of relationships among properties. He further identified three types of

pragmatic justifications to include: “naive empiricism,” in which a statement to be proved

is checked in a few (somewhat randomly chosen) examples; “crucial experiment,” in

which a statement is checked in a carefully selected example; “generic example,” in

which the justification is based on operations or transformations on an example which is

selected as a characteristic representative of a class. “Thought experiment” is identified as

conceptual justification, in which actions are internalized and dissociated from the

specific examples and the justification is based on the use of and the transformation of

formalized symbolic expressions (see Figure 1). Balacheff (1988) concluded that while

students experience difficulty producing proofs, they do however show awareness of the

necessity to prove and to use logical reasoning.

37
Pragmatic Conceptual
Justification Justification

Naive Crucial Generic Thought


empiricism experiment example experiment

Figure 1. Balacheff’s (1988) classification of students’ proving schemes

Extending the research of Bell (1976) and Balacheff (1991) and drawing from a

considerable collection of empirical data, Harel & Sowder (1998) proposed a taxonomy

of proof schemes consisting of three main categories, i.e. “external,” “empirical,” and

“analytical,” each of which encompasses several subcategories (see Figure 2). In

particular, external conviction proof schemes include instances where students determine

the validity of an argument by referring to external sources, such as the appearance of the

argument instead of its content (e.g. they tend to judge upon the kind of symbols used in

the argument instead of the embedded concepts and connection of those symbols), or

words in a textbook or told by a teacher. Empirical proof schemes, inductive or

perceptual, include instances when a student relies on examples or mental images to

verify the validity of an argument; the prior draws heavily on examination of cases for

convincing oneself, while the latter is grounded in more intuitively coordinated mental

procedures without realizing the impact of specific transformations. Lastly, analytical

proof schemes rely on either transformational structures (operations on objects) or

38
axiomatic modes of reasoning which include resting upon defined and undefined terms,

postulates or previously proven conjectures.

Figure 2. Proof schemes and sub schemes (Sowder & Harel, 1998)

39
Although the existing frameworks of proof schemes provide powerful vehicle for

classifying the types of proofs produced, they do not trace the cognitive stages that

learners might go through as they develop a more complete understanding of

mathematical proof. Attempt has been made to address this gap by studies focused on the

cognitive development of proof and reasoning.

Frameworks to depict the stages in proof learning

A great deal of research has been undertaken that explores and describes the

developmental stages that a learner goes through in their comprehension of mathematical

proof from the early stages in which s/he only possesses a primitive understanding of

mathematical objects and actions to more advanced levels where s/he is capable of

axiomatic reasoning (Tall et al, 2012). Since the ability to generate logical arguments is

among the most essential goals of any area of mathematics, progress in understanding the

subject is inseparable from the development of proof skills. Therefore, theories

concerning the learning progression often include a description of the maturation of

mathematical reasoning. The well-known van Hiele model (1986) for geometric thinking

is one of this type.

The van Hiele Model

The van Hiele model was originally proposed by two Dutch teachers, Pierre van

Hiele and Dina van Hiele-Geldof. They designed a framework which could depict the

development of geometric reasoning and hence explain how people grow in their

geometry knowledge. Five different levels of understanding through which an individual

40
passes when learning geometry were identified, including “visual,”

“descriptive/analytical,” “informal deductive,” “formal deductive,” and “rigor” (van

Hiele, 1986, see Figure 3). A brief description of each level is presented below.

Level 5: Rigor

Level 4: Formal Deductive

Level 3: Informal Deductive

Level 2: Descriptive/Analytic

Level 1: Visual

Figure 3. The van Hiele Model (van Hiele, 1986)

At the visual level (Level 1), learners could identify, name, and compare

geometric figures, such as triangles, rectangles, angles, parallel lines, etc., according to

how they look. For example, at this level, students may see the difference between

triangles and quadrilaterals by counting the number of their sides, but they may not be

able to tell a square is a rectangle since they “look different.”

At the descriptive/analytical level (Level 2), learners can recognize components

and properties of a figure; however, they cannot reason upon those properties. They are

able to describe figures in terms of their parts and relationships among these parts, to

41
summarize the properties of a class of figures, and use properties to solve basic

identification problems, but they cannot yet conduct deduction. For example, learners

know a right triangle is a triangle that has a right angle, but they cannot explain whether it

is possible for a triangle to have two right angles.

At the informal deductive level (Level 3), learners are able to connect figures with

their properties. They can justify figures by their properties as well as articulate the

properties of a given figure. The learners can understand and use precise definitions.

They are capable of using “if-then” thinking, but they cannot consciously use

mathematically correct language, nor can they realize the deductive property of their

reasoning. Their reasoning is based on intuition (Fischbein, 1982) instead of a

mathematical foundation. For example, the learners are able to claim that it is impossible

for a right triangle to have two right angles because if so there will be two sides that

cannot “meet.”

At the deductive level (Level 4), learners can reason about geometric objects

using their defined properties in a deductive manner. They could consciously construct

the types of proofs that one would find in a typical high school geometry course. They are

aware of what counts as a legitimate proof in mathematics.

At the highest level, rigor (Level 5), learners can compare different axiomatic

systems. Learners fully understand the structure of a system as well as its applications

and limitations. They can analyze and compare these systems.

The van Hiele model has been modified and extended by scholars to meet

particular research interest. For example, Clements and Battista (1992) add a level 0,

“pre-recognition,” where children were not able to visually identify the difference
42
between shapes, to this model to depict their cognition in geometry at the very beginning

stage. Pegg and Davey (1998) integrated the van Hiele model with another learning

theory, the SOLO taxonomy (Biggs & Collis, 1982), to describe how learning develops

within and through the levels.

The van Hiele model doesn’t trace the “in between levels of reasoning” (Burger &

Shaughnessy, 1986), nor does it offer enough details to depict how proof is perceived by

learners. This point became quite obvious when applying the model to study students’

development of proof skills. After all, the van Hiele model was not specially designed to

capture the development of proof ability in geometry, but to document levels of

geometric reasoning. The early two levels concern sense making and concept building,

while the ability to produce justification mostly develop at the higher three levels. It is

not suggested that the development of reasoning ability could be separated from sense

making and concept building; however an elaboration on children’s development of

proving capacity is needed.

Waring’s proof levels

Waring (2000) proposed the proof levels for elementary and secondary students to

“describe the development of proof concepts beginning with an appreciation of the need

for proof, then an understanding of the nature of proof, and finally pupils’ competence in

constructing proofs” (p. 10). Six levels are identified in the framework.

At Level 0, students are ignorant of the need to provide a mathematical

justification to confirm a statement. In particular, they may either think it is unnecessary

43
to offer a reason, or just refer to external (Harel & Sowder, 1998) sources, such as

teachers’ words and statements in the textbooks, to support their opinions.

At Level 1, students start to become aware of the need to provide a justification,

however they are not cognizant that a claim should be verified in all possible cases.

Instead, they just check a few cases and suggest the results are sufficient to support the

claim.

Moving up to Level 2, students still rely on empirical checking. While they are

more careful in choosing examples to verify, and may notice certain patterns in the

process, they still cannot produce a proof that accounts for all cases. This may be due to

their inability to realize the entire scope of discussion, absence of knowledge about the

need to clarify every case, or lack of language tools to describe the patterns they detect.

At Level 3, students become aware of the need to offer justification for general

cases, however they lack proof skills or basic understanding of the subject. Therefore,

they cannot produce a valid proof.

At Level 4, students are both aware and capable of producing generalized proofs.

However they can only do so in limited and familiar contexts.

At Level 5, students understand the rationale of proof as a reliable reasoning

method and can intentionally apply to justify in unfamiliar contexts.

Compared to the van Hiele model, Waring’s proof levels offer an account of how

the shift from informal to formal understanding of proofs may occur. However, both

frameworks provide a linear account of development (i.e. changes happen one after

another) with no space in the structure to describe processes that might occur randomly or

parallel to each other. For instance, lower levels in the van Hiele levels emphasize sense
44
making and concept building while reasoning is only emphasized in higher levels; while

in Waring’s model, awareness simply develops before understanding. Scholars have

proposed that the development of mathematical cognitions follows a more complex and

non-linear format (Lakatos, 1976; Kieren & Pirie, 1991; Martin, 2008; Pirie & Kieren,

1992).

The broad maturation of proof structures

Tall et al. (2012) proposed a two-dimensional model to depict the development of

factors that are involved in the maturation of one’s proof ability (see Figure 4). This

framework captures six key components (i.e. perceptual recognition, verbal description

and pictorial or symbolic representation, definition and deduction, equivalence,

crystalline concepts, and deductive knowledge structure) and their relationships in the

broad maturation of proof structure. Different from the van Hiele model, Tall et al. (2012)

suggest that the perceptual understanding doesn’t develop only at earlier stages. Instead it

continues to be refined when the understanding of the concept and deductive process is

advanced. This idea is consistent with the perspective of constructivism, in the sense that

the mathematical system possesses a dynamic structure so that a shift in understanding of

a factor may impact other components (Lakatos, 1976; Tall, 2005). Nevertheless, Tall et

al. (2012) don’t suggest that all the components in the structure develop simultaneously.

Instead, certain types of understanding serve as a prerequisite for others to occur. This

feature is denoted by the initial “height” of each component.

45
Figure 4. The broad maturation of proof structure (Tall et al, 2012)

Crystalline concept introduced in this framework has a crucial role in the

development of proof structure. It was described as “a concept that has an internal

structure of constrained relationships that cause it to have necessary properties as part

of its context” (p. 19). In other words, it is a concept with a pack of associated knowledge

attached to it. In order to construct deductive reasoning, involved concepts must not be

perceived as isolated objects. Only when the roads are built can a pass be drawn.

46
Theoretical Framework

The pilot study (Liu & Manouchehri, 2012) adopted Harel & Sowder’s (1998)

model to differentiate and classify different types of arguments, attempting to understand

if the proof scheme is a common indicator of whether an argument was found convincing

or appealing by an individual. The results of the pilot study suggested that students prefer

different proof schemes and may have distinct judgment of the same scheme in different

contexts. This result was consistent with Harel & Sowder’s (1998) finding that an

individual could simultaneously hold different proof schemes. Since the proof scheme

alone cannot determine whether an argument is preferred or accepted as reliable by a

student, more factors need to be considered to explain the phenomenon. De Villiers’

(1990, 2003) model offers one advantage (i.e. the intention of the learner in creating the

argument) to explain why a certain approach may be preferred by learners. Waring ’s

(2000) model provides another dimension to explain a learner’s conception of proof by

referring to stages the learners may be achieved at the time of assessment. The broad

maturation model proposed by Tall et al. (2012) adds to the conversation by considering

the impact of the learner’s understanding of related mathematical topics on their

perception of proofs. Healy & Hoyles (2000) suggested gender can also be a factor. In

addition, a substantial number of studies have tended to explain students’ understanding

of argumentation methodology from what they experienced in school intervention

(Dreyfus, 1999; Hoyles, 1997; Herbst & Branch, 2006; Schoenfeld, 1988; etc.). Indeed,

intention, and school experience are factors that impact learners’ preference for and use

of proof modes. However, what ultimately impacts students’ judgment is their

understanding of the argument (see Figure 5). In order to distinguish the types of
47
arguments students found convincing and appealing to them, we must understand what

features or factors of an argument that impact their evaluation.

The statement of Personal understanding Personal evaluation


the argument of the argument of the argument

Figure 5. Evaluation of argument is based on understanding

According to the survey conducted by Mejia-Ramos and Inglis (2009), a majority

of reported studies on mathematical proof are concerned about students’ estimation,

exploration and justification of a mathematical conjecture, but few studies pay attention

to how students comprehend and evaluate a given proof. In addition, instruments that

assess students’ comprehension of proof are also underdeveloped (Mejia-Ramos et al.,

2012). Research on students’ comprehension of given arguments are rare and in great

need. This is for several reasons. First, in school practice, reading and understanding the

proofs offered by teacher or course materials serves as a main venue for students to

develop their conception and skills of proofs (Weber, 2004). Second, the evidence and

logic students use to construct a proof is usually familiar to them, however when

evaluating a proof, they may encounter unknown resources and unfamiliar reasoning

methodologies, and their judgment of a reasoning method in such an unfamiliar scene

would reveal some basic features of their conviction system. Lastly, the ability to judge a

mathematical argument is an indispensible skill for one to consciously construct proofs.

Without an internal understanding of what kind of reasoning process is reliable, it is

48
impossible for one to monitor and inform his/her own construction of mathematical

proofs. Understanding how students evaluate and assimilate (or exclude) different ideas is

critical in understanding their learning process, and the learning of proof is not an

exception. Therefore more studies that focus on students’ thinking in reading and judging

about proof are needed (Healy & Hoyles, 2000; Mejia-Ramos & Inglis, 2009; Selden &

Selden, 2003; Yang & Lin, 2008).

Yang & Lin (2008) took an initiative to address the gap by proposing the Reading

Comprehension of Geometry Proof (RCGP) model to describe the stages that learners go

through in understanding a given geometric proof (see Figure 6). They suggested that

students first identified isolated conceptual and procedural knowledge in the statement at

the surface level. They then start to recognize some knowledge and statements are

premises, some are conclusions and some are description of properties. Moving up to the

chaining elements level, students are able to identify and understand the connection

between premises, conditions, properties and conclusions. While at the highest level,

encapsulation, students gain a systematic and organized view of the elements in the proof;

they are well aware of what the premises and conclusions are and fully understand the

causal relationships among them.

49
Figure 6. Reading Comprehension of Geometry Proof (RCGP) Model (Yang & Lin, 2008)

Following the identification of the four levels of understanding, Yang and Lin

(2008) further examined conditions under which a learner’s understanding could move

toward higher levels. In particular, they suggested that Basic Knowledge (i.e.

understanding of the terms and sentences), Logical Status (i.e. realization of the logical

relationship), Summarization (i.e. capturing the care of a logical relationship), Generality

(i.e. understanding to what extend the argument is valid), and Application (i.e. knowing

how to apply the proposition) make up the critical understanding a learner needs to

develop in order to achieve higher levels of proof reading comprehension, as illustrated in

Figure 6.

The RCGP model serves in understanding the development of students’

comprehension of a particular proof. The model suggests that only when certain

50
understanding is in place a learner can comprehend a proof as a logically coherent

argument. Therefore in order to evaluate the reasoning process of a mathematical

argument in an informed manner, students must at least reach the understanding at the

chaining elements level since only at this level can students start to see the relations

within the statement. In other words, judgment can only be made upon certain levels of

understanding, and judgment about reasoning method can only be made when they see

the connection.

According to the RCGP model, there are three kinds of understanding that

students need to develop in order to reach the chaining elements level. Generally

speaking, students need to be able to understand the concepts used in the argument,

students need to be able to identify what is the evidence on which the argument is based,

and students need to be able to see the connection between the premises and results.

When shifting the attention from students to the arguments, it is noticeable that

these three types of understanding in fact point out three key aspects about the argument

that could, to a large degree, influence students’ comprehension and evaluation of a

mathematical argument, i.e. the representation (which describes the concepts and other

terms), the source of conviction (which states what is taken for granted), and the link

between the evidence and conclusion (which represents the reasoning process). Three

similar aspects were addressed by Stylianides and Stylianides (2008a) as the modes of

arguments, the set of accepted statements, and the modes of argumentation. It is assumed

that only when students understand the presentation, agree with the source, and recognize

the link would they consider an argument as reliable. With the three identified aspects,

the investigation becomes one of examining what kind of presentation, what kind of
51
source, and what kind of link contributes to students’ conviction of an argument. So the

next step of model building is to identify the different genres in each aspect.

Bruner (1966) synthesized three kinds of representations when communicating

ideas, i.e. enactive (which involves the use of gesture and physical actions), iconic (which

involves the use of pictures, graphs and visual tools) and symbolic (which includes the

use of natural language, numbers, and logic). However, in the current study, where the

communication is carried out by mathematical arguments, the enactive way is not utilized.

In mathematics, algebraic representation indeed sets up as a different way of

communication from casual language. Therefore we see the need to distinguish the two

forms of arguments. In the theoretical framework of the study, four different

representations of a mathematical argument are conceptualized. They are: narrative,

numerical, symbolic, and visual. Narrative arguments refer to those using casual

language. A typical example could be “Because the car is slower, it takes a longer time to

get to the destination.” Numerical argument refer to those using numbers and elementary

mathematics symbols (such as “+,” “-,” and “<”). For example, “since 12 = 3 * 4, then 12

is a multiple of 3” falls in this category. Symbolic arguments refer to those using letter

symbols to represent mathematical concepts and communicate ideas. At the secondary

levels, it is not expected that students will use formal language as do those in an

advanced algebra class. Therefore, the symbolic arguments may contain a large amount

of casual language as well. However, what distinguishes a symbolic argument from a

narrative one is that it is impossible to understand a symbolic argument without knowing

the role played by the letter symbols. For instance, the argument “Since x2-2x+1=(x-1)2,

then it must be non-negative” falls in this category. The last type is visual arguments,
52
where visual aids are provided to present concepts and to communicate ideas. This

category is the same as the iconic way of communication conceptualized by Bruner

(1966). A typical geometry proof that uses figures falls in this category.

The classification of sources of conviction as well as the link between source and

conclusion is informed by Harel & Sowder’s (1998) model. However, unlike Harel &

Sowder’s model which categorizes arguments as a whole, this framework classifies the

source of conviction and the link between source and conclusion separately. This

alternative approach was impacted by a reflection on the application of Harel & Sowder’s

model on several concrete cases. For example, a step-by-step correct and complete proof

initiated from the Pythagoras Theorem would most likely be classified as a deductive

proof using Harel & Sowder’s model. However, students who write down the same proof

may actually have different comprehension of it. For instance, some of them may view

the Pythagoras Theorem as an external authority without understanding why it holds,

some of them may view it as an assumption, and some of them may view it as part of

their concept of a right triangle. It was suggested that students’ inconsistency in

preference and evaluation of proofs in different contexts as observed in the pilot study

(Liu & Manouchehri, 2012) could partly be due to the lack of preciseness when using

Harel & Sowder’s model to depict students’ way of reasoning. Therefore by looking at

source and link separately, we expect to get a more detailed picture of students’

comprehension of proof. In particular, students could view the source of an argument as

authority (i.e. what’s stated by a respectful knowledge carrier, e.g. teachers,

mathematicians, books, agreement of a community, etc.), example (i.e. result from an

immediate test), imaginary (i.e. mental image created upon or recalled from previous
53
experience), fact (i.e. well known existing mathematical results), an assumption (i.e. an

assumed truth for the argument to be based on), and opinion (conviction without an

explicit reason). The types of link between source and conclusion include direct

indication, perceptual connection, induction, transformation (Simon, 1996;), ritual

operation (Healy & Hoyles, 2000), and deduction. In direct indication, the conclusion is

the required condition of source without any additional understanding (e.g. “Since the

squares of a positive number, a negative number and 0 are all non-negative, then the

square of a real number is non-negative”). Perceptual connection refers to linking source

and conclusion based on visualization or intuition. The argument “Since f(x) is a much

longer term than g(x), then f(x) must be larger” is an illustration of the use of perceptual

connection in an argument. The use of metaphor falls into this category as well. Induction

and transformation both refer to a conclusion informed by several piece of empirical

evidence, however the later involves a further investigation and notice of properties that

connect the empirical cases. The use of generic examples (Balacheff, 1988) falls in the

category of transformation. Ritual operation and deduction both refer to a valid reasoning

procedure, however one using the former doesn’t know why the procedure works (e.g.

using an algorithm without knowing why it works) while one using the latter is well

aware that each step in the process connects an evidence to its required condition. The

structure of the framework is illustrated in Figure 7.

54
Presentation

Symbolic
Link

Deduction
Numerical
Ritual operation

Transformation
Narrative
Induction

Visual Perceptual connection

Direct indication

Source
Authority Example Imaginary Fact Assumption Opinion

Figure 7. Framework to classify students’ comprehension of a mathematical argument

To exemplify how the framework is used in the current study, let’s consider the

following argument: “Since 2+2=4, 2+4=6, 2+6=8, 4+6=10, then the sum of two even

numbers must also be even.”

The representation of the argument is narrative. The source of conviction is

experience since it is based on results of several trials. The link would most likely be

classified as induction. There shouldn’t be much ambiguity about the representation and

the source of conviction in this case. However, when judging the link between source and

conclusion, we cannot be certain about whether the nature of reasoning was purely

inductive. For instance, it is possible that when one reads the proof s/he may have noticed

some patterns from the trial results but hasn’t explicitly expressed the discoveries. In

order to clarify such concerns, additional information need to be elicited so as to confirm

55
conjectures or assumptions regarding choices made. Questions such as “why do you think

checking on a few cases is sufficient for a conclusion about every case?” can potentially

provide a venue to the individuals’ thinking.

In order to clarify ambiguities associated with sources contributing to individuals’

choices, it is important to acknowledge that neither the representation, source of

conviction, nor the link between source and the conclusion can be identified merely by

looking at the argument itself. Instead, they reside in one’s comprehension of the

argument, even though the expression of the argument can certainly influence one’s

understanding. For the purpose of convenience, one’s interpretation of an argument is

called an “internalized argument” for the rest of the work. Accordingly, this framework,

which classifies different internalized arguments, is called the Classification Cube of

Internalized Arguments (CCIA) thereafter.

Using CCIA to categorize different types of comprehensions of mathematical

arguments, the current study investigated the type of arguments students considered

convincing, explanatory and appealing. In addition, the study examined whether there

were common types of representation, source and link that contributed to students’

choices. Furthermore, similarities and differences among individuals and among the

contexts were studied in order to identify personal factors that influenced their judgment.

Detailed research methods and procedures are provided in the next chapter.

56
CHAPTER 3. METHODOLOGY

This study sought to investigate what kind of mathematical arguments students

considered as convincing, explanatory and appealing, and what factors influenced their

evaluations. In order to do so, both quantitative and qualitative data were collected and a

mixed method design was utilized.

Mixed Method Designs

Conducting mixed-methods research involves the collection, analysis, and

interpretation of both quantitative and qualitative data in a single study or in a series of

studies that investigate the same phenomenon (Onwuegbuzie & Leech, 2006; Creswell &

Plano Clark, 2011). Quantitative research emphasizes deductive logic, and utilizes

numerical data; whereas qualitative research emphasizes inductive logic, and often

utilizes textual and pictorial data (Teddlie & Tashakkori, 2009). Quantitative research

tends to eliminate researchers' biases, so that they can remain emotionally detached and

uninvolved with the objects of the study and test or empirically justify their stated

hypotheses; whereas qualitative research “contend(s) that multiple-constructed realities

abound, that time-and context-free generalizations are neither desirable nor possible, that

research is value-bound, that it is impossible to differentiate fully causes and effects, that

logic flows from specific to general and that knower and known cannot be separated

because the subjective knower is the only source of reality” (Johnson & Onwuegbuzie,

57
2004, p. 14). A combination of quantitative and qualitative research designs serves to

fulfill five purposes, i.e. triangulation, complementarity, development, initiation, and

expansion (Greene, Caracelli, and Graham, 1989). More specifically, triangulation seeks

common results from different methods to reduce the inherent method bias of any

particular method, including the inquirer bias, theory bias, and context bias.

Complementarity increases the meaningfulness of inquiry results by elaborating,

enhancing, illustrating and clarifying the results from one method with the results from

other methods. Development concerns the method design using the results from other

methods. Initiation deepens and broadens the inquiry by seeking new perspectives of

frameworks, or the discovery of paradox and contradiction between results from different

methods. Lastly, expansion extends the scope of inquiry by using methods most

appropriate for certain inquiries.

Johnson & Onwuegbuzie (2004) suggested that the logic of inquiry in mixed

methods research includes “the use of induction (or discovery of patterns), deduction

(testing of theories and hypotheses), and abduction (uncovering and relying on the best of

a set of explanations for understanding one’s results)” (p. 17). The quantitative and

qualitative methods can be mixed by ordering them sequentially, merging them, or

embedding one strand within the other.

This study aimed to explore the kind of mathematical arguments students

considered as convincing, explanatory and appealing, and sought explanations to such

evaluations. The quantitative method could help with collecting data from a large sample

and hence facilitate the discovery of patterns and testing of hypotheses. It served to

enhance the scope of inquiry and the generality of the findings (Cohen, 1988). Findings
58
based upon statistical analysis could highlight connections between students’ thinking

and characteristics of the content. However, since participants were only asked to

complete multiple-choice items in the survey that were predefined by the researchers, the

quantitative study ran short of providing opportunity to explore learners’ own

explanations. Therefore, the qualitative methods were critical in offering further

interpretations of emergent patterns, and enhancing the analysis of the study by providing

insights into specific cases (McConaughy & Achenbach, 2001; Yin, 2009). Both methods

were needed in order to provide a comprehensive and meaningful explanation for the

objective of this study, i.e. students’ evaluation of mathematical arguments.

Procedure of the Study

Adopting a mixed methods design, the study consisted of the development,

administration and analysis of a survey and follow up interviews (see Table 2). The

survey and interview protocol were designed and refined in 2012. The survey was

administered in January - February 2013, and the follow up interviews were conducted in

April 2013. The survey was administered in the participants’ schools and took 30-60

minutes to complete. Individual interviews lasted approximately an hour each.

Participants of the study, development of survey instrument, procedures of the interview,

as well as the data analysis process are described in the following sections of this chapter.

59
Timeline Task Summary of the task
2012 Instrument The instrument for the survey and the interview
Development protocol was designed based upon existing literature
and findings from pilot studies.
January - Survey The survey was administered online using the
February, Administration instrument called Survey of Mathematical Reasoning.
2013 The survey took 30-60 minutes to accomplish.
February – Survey Students’ evaluations of mathematical arguments
April, 2013 Analysis were quantitatively analyzed. Survey results were
used to determine the participants of the interview.
April, 2013 Interview Follow up one-on-one interviews were conducted
Conducted with individuals selected from those who had taken
the SMR to further investigate why they made certain
choices in the survey. Each interview lasted about an
hour
April – June, Interview Students’ responses in the interview were
2013 Analysis qualitatively analyzed. Factors that influenced
students’ decisions were conceptualized and
synthesized.

Table 2. Outline of the procedure of the study

Sample

The population of interest in this study was 8th grade students. Two reasons

contributed to this choice. First, according to Piaget’s (1985) Intellectual Development

Stages, middle school students are at a critical cognitive phase where they can engage in

abstract and logical thinking. Therefore, how they learn to value different arguments at

60
this stage could potentially impact their reasoning skills and thinking habit in the later

years. Second, the grade band serves as a bridge between middle and high school

mathematics and the link between informal and more formal and abstract mathematical

reasoning (Knuth, Choppin, & Bieda, 2009). According to the curriculum standards

(CCSSO, 2010), most 8th grade students should have obtained basic understanding of

numbers, shapes, chance, and algebraic expressions, know some simple propositions and

properties, and should be able to see the connection between concepts and ideas.

However, they may not have yet adopted abstract thinking or deductive ways of

mathematical reasoning using conventional proving techniques and forms. Therefore, the

features of arguments they consider as convincing, explanatory, and appealing can offer

valuable references for the development of resources and instructional explanations that

can facilitate students’ internalization and adoption of more mathematically sound

argumentations.

Survey Participants

Over 500 8th grade students from 5 different public schools in Ohio took the

survey in January and February of 2013. According to the 2012 spring Ohio state

standardized 7th grade mathematics test results, two of the schools had performed below

state average (at least 10% below as measured by percentage of proficiency), one

school’s performance was at the state average, while the other two schools’ performance

was above the state average (about 10% above as measured by percentage of proficiency).

The survey was given to the students in their respective school setting during a regular

class period.
61
Data trimming was conducted to exclude unreliable information. We excluded

data from those who hadn’t completed the survey and those who had chosen the same

option for almost all questions. In particular, the survey contained 48 questions that

required student to select one of the three options: “agree” “disagree” and “not sure.” If a

participant chose the same option for all but 5 questions (10% of the total), we considered

his/her responses not made based on careful analysis. Hence, the actual data used in

analysis in this study consisted of responses from 476 respondents. 48.1% of the

participants were “male,” and 49.8% were “female.” The remaining 2.1% chose not to

disclose their gender. In responses to the question about ethnicity, 78.6% selected “White,

not of Hispanic origin”, 7.1% selected “Black, not of Hispanic origin”, 1.7% selected

“Hispanic”, 2.1% selected “American Indian or Alaskan Native”, 0.4% selected “Asian

or pacific islander.” 10.1% of the respondents chose not to disclose their ethnicity. In

response to the question about the math courses completed, 88.5% of the students

indicated that they had taken or were taking Algebra I or an equivalent Integrated 8th

Grade Mathematics course, 10.3% indicated that they had taken or were taking Geometry,

and 2.5% indicated that they had taken or were taking Algebra II. Based on the

demographics of the sample, we believe our data to be a fair representative of the 8th

grade student population.

Survey Instrument: Survey of Mathematical Reasoning

The Survey of Mathematical Reasoning (SMR) (see Figure 8) was designed to

capture students’ judgment of different arguments and their personal comprehension of

62
them. SMR was published online and participants took the survey on the website during

one of their class periods. All the items on the SMR were multiple-choice.

63
BACKGROUND INFORMATION

Thank you for agreeing to participate in this study. Your responses to the survey are
confidential and will not be shared with your teachers in school.
The survey contains 4 mathematics problems. For each problem you will need to evaluate
4 mathematical arguments and answer related questions. There is no right answer to those
questions. We just want to know your opinion.
Please read the questions carefully and pick the options that best match your opinion.
Please plan on using 30 to 45minutes to complete the survey.
Now let's start!

1. The name of your school: _________________________

2. The grade you are in: 6 7 8 9 10 11 12

3. Your student ID as assigned by your school (or your math coach): ________________

4. Your gender:
Male Female I choose not to answer this question

5. Please describe your race/ethnicity.


American Indian or Alaskan Native
Asian or Pacific Islander
Black, not of Hispanic origin
White, not of Hispanic origin
Hispanic
Other (or I choose not to answer this question)

6. Mathematics courses you have taken (including the course you are taking):
Pre-algebra Algebra I Algebra II Geometry
Integrated 7th grade math Integrated 8th grade math
Other (please specify) _____________________

On the next page, we will start to work on some math problems. Ready to go?

Continued

Figure 8. Survey of Mathematical Reasoning 4

4
SMR used in the study was a web-based survey. Therefore, although sharing the same content, the survey on the
internet had a different layout from what is shown here.
64
Figure 8 continued

PROBLEM A

Shaina claimed that:

“A multiple of 6 must also be a multiple of 3.”

Arguments A1 - A4 are offered by different people to justify Shaina’s claim. Please read
each of the arguments carefully and pick the options that best describe your thinking in
Questions 7 - 11.

************************************************************************

Argument A1: I’ve tried plenty of multiples of 6 (like 12, 60, 606, etc.) and found they
are multiples of 3 as well. So I am sure that Shaina’s statement must be true.

7. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

Argument A2: Any multiple of 6 can be written as 6n. We know that 6n = 3•2n, which is
a multiple of 3. Therefore a multiple of 6 must also be a multiple of 3.

8. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

65
Figure 8 continued

Argument A3: If the total number of cookies is a multiple of 6, then we can put them
into several boxes where each box contains 6 cookies. We can further divide each
box into 2 packages, where each package contains 3 cookies. Now all the cookies
are put into packages of 3. Therefore, the total amount of cookies must also be a
multiple of 3.

9. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

Argument A4: The total number of square cards below is a multiple of 6:

We can rearrange the squares in this way:

Now we can see that a multiple of 6 must also be a multiple of 3.

10. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

66
Figure 8 continued

11. After evaluating each argument, which of them is closest to what you will use in
arguing about Shaina's claim? 5
Argument A1 Argument A2 Argument A3 Argument A4
None of the arguments is close to what I will use. This is how I will argue:

continued

5
A1 - A4 were relisted below this question in the online version of SMR to allow students see all the arguments that
needed to be compared. The same layout was adopted for the other three problems used in the survey.
67
Figure 8 continued
PROBLEM B

Ryan claimed that:

“The diagonal of a rectangle must be longer than each of its sides.”

Arguments B1 - B4 are offered by different people to justify Ryan’s claim. Please read
each of the arguments carefully and pick the options that best describe your thinking in
Questions 12 - 16.

************************************************************************

Argument B1: I’ve drawn several rectangles and measured the length of their sides and
diagonals. I found that the diagonal of any of those rectangles is longer than any
side of the same rectangle. So Ryan’s statement must be true for all rectangles.

12. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

Argument B2: Imagine that you are standing on the corner of a football field. Then the
diagonal of the field is definitely longer than any of its sides. So Ryan’s claim
must be right.

13. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

68
Figure 8 continued

Argument B3: As shown in the figure below, ABCD is a


rectangle. Since ∠A = 90°, then by the Pythagorean
Theorem,
BD^2 = AB^2 + AD^2.
So BD^2 > AB^2 and BD^2 > AD^2
(The notation X^2 means the square of X. For example, BD^2 means the square
of BD). Therefore, BD is longer than AB and longer than AD.

14. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

Argument B4: Suppose ABCD is a rectangle.


Draw a circle using B as the center and BD
as the radius. From the figure shown, we
can see that BD = BQ = BP. Since BC <
BP and BA < BQ, then both BA and BC
are shorter than BD. Therefore, the
diagonal of a rectangle must be longer than
any of its sides.

15. What do you think of the argument above?


Please pick the option that best matches your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

69
Figure 8 continued

16. After evaluating each argument, which of them is closest to what you will use in
arguing about Ryan's claim?
Argument B1 Argument B2 Argument B3 Argument B4
None of the arguments is close to what I will use. This is how I will argue:

continued

70
Figure 8 continued

PROBLEM C

There are two triangles. The lengths of the three sides of Triangle I are A, B, and C and
the lengths of the three sides of Triangle II are a, b, and c. Jennifer claims that:

“If A > a, B > b and C > c, then the area of Triangle I must also be larger than
Triangle II.”

Arguments C1 - C4 are offered by different people to justify Jennifer’s claim. Please read
each of the arguments carefully and pick the options that best describe your thinking in
Questions 17 - 21.

************************************************************************

Argument C1: If A = B = C = 2, a = b = c =1, then Triangle I is obviously larger than


Triangle II. I also tried many other cases (as shown in the figures below) and
found Triangle I always has an area larger than that of Triangle II. So I am sure
Jennifer's claim must be correct.

17. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

71
Figure 8 continued

Argument C2: We all know that the area of a triangle equals 1/2 of the product of its
base and height. As shown in the figures below, the area of Triangle I = BH/2, and
the area of Triangle II = bh/2. We know that B > b. In addition, since A > a and
C > c, then it must be true that H > h. So BH/2 must be larger than bh/2. Therefore
the area of Triangle I must be larger than the area of Triangle II.

18. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

72
Figure 8 continued
Argument C3: As shown in the figures below, since each side of Triangle II is shorter
than the corresponding side of Triangle I, we can cut each side of Triangle I
shorter and then compose Triangle II using the shortened sides. Therefore, the
area of Triangle II must be smaller than the area of Triangle I.

19. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

Argument C4: Since each side of Triangle I is longer than the corresponding side of
Triangle II, then the perimeter of Triangle I must also be longer than the perimeter
of Triangle II. If we make the two triangles using wires, then it needs a longer
wire to make Triangle I than Triangle II. Using a longer wire we can make a larger
triangle. Therefore the area of Triangle I is definitely larger than the area of
Triangle II.

20. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued
73
Figure 8 continued

21. After evaluating each argument, which of them is closest to what you will use in
arguing about Jennifer's claim?
Argument C1 Argument C2 Argument C3 Argument C4
None of the arguments is close to what I will use. This is how I will argue:

continued

74
Figure 8 continued

PROBLEM D

The sales tax rate of the state where Ravi lives is 5%. Ravi is buying a new bike in a local
bike store and has a $20 coupon. Ravi claims that:

“I can always save $1 if the $20 coupon is applied before tax rather than after tax,
regardless of the actual price of the bike.”

Arguments D1 - D4 are offered by different people to justify Ravi’s claim. Please read
each of the arguments carefully and pick the options that best describe your thinking in
Questions 22 - 26.

************************************************************************

Argument D1: Suppose the original price of the bike is $100.


If the coupon is applied before tax, then Ravi needs to pay
(100 – 20) × (1 + 5%) = 84 dollars.
If the coupon is applied after tax, then Ravi needs to pay
100 × (1 + 5%) – 20 = 85 dollars, which is $1 more than what he needs to pay if
the coupon is applied before tax.
I tried some other possible prices of the bike, such as $200, $500, etc., and found
he always pays $1 less if the coupon is applied before tax. Therefore, I am sure
Ravi’s claim is always right.

22. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

75
Figure 8 continued

Argument D2: Suppose the original price of the bike is x dollars.


If the coupon is applied before tax, then Ravi needs to pay
(x – 20) × (1 + 5%) = 1.05x – 21 dollars.
If the coupon is applied after tax, then Ravi needs to pay
x × (1 + 5%) – 20 = 1.05x – 20 dollars.
Notice that (1.05x – 20) – (1.05x – 21) = 1. Therefore, Ravi always saves one
more dollar if the coupon is applied before tax rather than after tax.

23. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

Argument D3: If the coupon is applied before tax, then Ravi doesn’t need to pay the tax
for the $20 discount. If the coupon is applied after tax, then he needs to pay the
tax of the original price of the bike. Notice that $20 × 5% = 1. Therefore Ravi
always saves one more dollar if the coupon is applied before tax rather than after
tax.

24. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

76
Figure 8 continued

Argument D2: Suppose the original price of the bike is x dollars.


If the coupon is applied before tax, then Ravi needs to pay
(x – 20) × (1 + 5%) = 1.05x – 21 dollars.
If the coupon is applied after tax, then Ravi needs to pay
x × (1 + 5%) – 20 = 1.05x – 20 dollars.
Notice that (1.05x – 20) – (1.05x – 21) = 1. Therefore, Ravi always saves one
more dollar if the coupon is applied before tax rather than after tax.

23. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

Argument D3: If the coupon is applied before tax, then Ravi doesn’t need to pay the tax
for the $20 discount. If the coupon is applied after tax, then he needs to pay the
tax of the original price of the bike. Notice that $20 × 5% = 1. Therefore Ravi
always saves one more dollar if the coupon is applied before tax rather than after
tax.

24. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

continued

77
Figure 8 continued

Argument D4: Let x be the original price of the bike and y be how much Ravi actually
needs to pay (after applying the coupon and tax). Based on calculation, the graph
below is generated by a graphing calculator to illustrate the two situations: the
solid line represents how much Ravi needs to pay if the coupon is applied after
tax; the dashed line represents how much he needs to pay if the coupon is applied
before tax. From the graph, we can
see that the solid line is parallel to the
dashed line and is always 1 unit
above it. Therefore, Ravi can always
save one more dollar if the coupon is
applied before tax rather than after
tax.

25. What do you think of the argument above? Please pick the option that best matches
your opinion.

Agree Disagree Not Sure


You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.

26. After evaluating each argument, which of them is closest to what you will use in
arguing about Ravi's claim?
Argument D1 Argument D2 Argument D3 Argument D4
None of the arguments is close to what I will use. This is how I will argue:

78
The design of SMR was informed by Healy & Hoyles’s (2000) student-proof

questionnaire. The student-proof questionnaire was created to capture the respondents’

views about proofs. The questionnaire consisted of three sections. First, students were

asked to offer a written description of their general understanding of the purpose of

proving. Then several mathematical conjectures and different arguments to justify the

conjectures were provided, and students were asked to pick arguments they would adopt

for themselves and those they considered would receive the best mark from their teachers.

Lastly, students were asked to offer an evaluation of the arguments based on how

convincing and explanatory they found each one. Based on the specific focus of this

study, several modifications were made to Healy & Hoyles’ questionnaire to

accommodate the research goals, as described below.

First, the current study didn’t concern students’ perception of “mathematical

proof,” rather, the kind of arguments they find convincing, explanatory and appealing.

Therefore, participants were not asked to offer a written description of their

understanding of “proof” on the survey, nor was there a need to identify arguments that

they believed would receive “the best mark” (Healy & Hoyles, 2000). However,

participants were still asked to identify an argument in each of the problem contexts that

they were likely to adopt for themselves. Moreover, they judged whether each argument

was convincing and explanatory to them.

Second, the design of problem contexts (mathematical items) as well as the choice

of arguments included in each case was informed by the following considerations:

 The concepts involved in the conjectures and arguments must be

understandable by the participants.


79
 The reasoning process utilized in each argument shouldn’t include a complex

combination of reasoning modes.

 A variety of techniques to justify the conjecture needed to be present.

 The difference between the content of the conjectures should be apparent.

As shown by various cognitive development models (Tall et al., 2012; van Hiele,

1986; Yang & Lin, 2008), understanding of the concepts was a prerequisite for the

realization of a connection to occur. Since this study focused on reasoning rather than

representation and concept building, it was necessary to make sure that the participants

understood the concepts so that the differences in their judgment could be attributed to

their evaluation of the reasoning methods. The reason to choose relatively “simple”

arguments was for the feasibility of analysis. If an argument involved multiple modes of

reasoning, it would be difficult to identify its features according to the CCIA model,

which would make the coding less usable. Various arguments were desirable to verify

each conjecture since we needed a considerable amount of different arguments to inform

the comparison and to examine the framework. Lastly, different contexts were supposed

to magnify the impact of subject strand on participants’ judgment.

To meet these goals, three conjectures (in Problems A, B, and D, see Figure 8)

were chosen, each representing a topic from one of the three branches of school

mathematics: number theory, and geometry (Problem A was also used by Stylianides &

Stylianides, 2008a). However, it was not assumed that students’ reasoning would be

identical within a branch. Instead, the purpose of choosing conjectures from three

different areas was to provide distinct contexts to detect differentiated judgment and

80
preference of argument types. The three conjectures were all true statements 6, which

provided structural consistency across the problems. Nevertheless, we also included a

false conjecture (in Problem C) in the survey with the intent to seek contrasting data. The

cases that would falsify the conjecture in Problem C were not familiar to the students and

were not easy to be detected. By including the contrasting problem, we aimed to detect

any patterns in students’ judgment that would continue to persist when evaluating false

arguments.

Figure 9. The structure of SMR

6
In Problem D, it was assumed that a bike costs more than $20. This condition was not articulated in the statement of
this problem in order to see if any student would point out this issue.
81
Figure 9 demonstrates the structure of SMR. Four arguments (e.g. Arguments A1 -

A4 in Problem A) were provided in validating each conjecture. All four arguments

supported the validity of the conjecture (even the validity of the false conjecture in the

contrasting problem). We did so because proving and disproving a conjecture engages

different reasoning processes. Finding a counter example is adequate to disprove a

conjecture; however, finding some (but all) examples that satisfy a conjecture is not

adequate to prove its validity. Since fostering realization of the latter point is one of the

major goals of proof instruction (Stylianides & Stylianides, 2008b; Waring, 2000), this

study leaned towards exploring student thinking when evaluating “proof” instead of

“refutation.” The four arguments developed in proving each conjecture were classified as

inductive, algebraic, visual, and perceptual. The inductive argument showed proving

attempts by offering a few examples that supported the validity of the proposed

conjecture. The algebraic argument engaged symbolic representation of the context and

then reinterpreted symbolic results to support the conjecture. The visual argument relied

on graphs and figures to provide proof evidence. The perceptual argument related the

problem to a more familiar context and supported the conjecture via such a connection.

Among all the arguments, four (A1, B1, C1 and D1) were inductive; four (A2, B3, C2

and D2) were algebraic; four (A3, B2, C4, D3) were perceptual; and four (A4, B4, C3,

and D4) were visual (see Table 3).

82
Inductive A1, B1, C1, D1
Algebraic A2, B3, C2, D2
Perceptual A3, B2, C4, D3
Visual A4, B4, C3, D4

Table 3. Type of the arguments used in SMR

Participants needed to respond to several questions that were related to each

argument. They were asked to determine whether they understood the concepts used in

the each of the arguments (for the purpose of confirmation), whether they believed the

argument showed the conjecture was always true, and if the argument helped them

understand why the conjecture was true. We designed these questions since the power of

verification and explanation were regarded as two major functions of proofs that are

recognized by students (de Villiers, 1990, 2003; Hanna, 2000b; Healy & Hoyles, 2000).

After reading all 4 arguments for each conjecture, participants were asked to determine

which of them was the closest to what they would use in the same context. We were

particularly interested in studying students’ preferred type of arguments since

understanding students’ preference, common or diverse, would be helpful in explaining

why students might experience difficulty when learning about proofs. This knowledge

can also inform the design of tasks that encourage conceptual understanding of proof as a

reliable way of reasoning.

83
Interview Participants

Participants for the follow up interviews were selected from those whose SMR

responses were used in the survey analysis. Based on the survey results, the participants

were divided in two groups, the consistent group and the inconsistent group. The

consistent group was composed of those who had preferred the same type of arguments in

at least 3 of the 4 problems (i.e. they chose the same type of argument at least 3 times in

Questions 11, 16, 21 and 26. See Figure 8). 141 of the 476 participants belonged to this

group. The inconsistent group was composed of the remaining participants (a total of

335). No member of this group had preferred any particular type of arguments in more

than 2 of the 4 problems.

To select the representatives to participate the interview, we divided the consistent

group into 4 subgroups, each of which consisted members who demonstrated a tendency

to prefer a particular argument type in their responses to SMR (inductive, algebraic,

visual or perceptual). To select a representative from each subgroup, we obtained a

random number, say n, using a random number generator, and then chose the nth students

from the top of the list as an interview participant. By doing so, we randomly picked 4

representatives, Allen, Abby, Alice, and Amy, from the consistent group. The names are

pseudonyms and are gender appropriate. Using the similar strategy, we randomly picked

4 representatives (by running the random number generator 4 times), Beth, Betty, Blake,

and Brenda, from the inconsistent group. These names are pseudonyms and are gender

appropriate as well. These 8 students participated the interviews.

All the subjects were taking Algebra I or an equivalent Integrated 8th Grade

Mathematics class at the time when they were interviewed. Two of the subjects were
84
taking Honors Algebra I, among which one was from the consistent group while the other

was from the inconsistent group. Since the interviews were recorded at the end of the

spring semester, the subjects were close to finishing their coursework for the school year.

There were 1 male and 3 females in the each group. Seven of the subjects were Caucasian

and only one interviewee (Betty) was African-American. All subjects were enrolled in

rural or suburban school districts. All subjects were native English speakers. The subjects’

background information is summarized in Table 4.

Math Course Number of Preferred Arguments by Scheme


Subject Gender
Taking Inductive Visual Perceptual Algebraic
Allen M Algebra I (Honors) 1 3 0 0
Abby F Algebra I 3 0 0 1
Alice F Integrated Math 0 1 3 0
Amy F Algebra I 1 0 0 3
Beth F Algebra I 0 2 1 1
Betty F Algebra I (Honors) 1 1 1 1
Blake M Integrated Math 1 0 2 1
Brenda F Algebra I 2 2 0 0

Table 4. Background information of the subjects

Interview Procedure

The survey results suggested that students’ preference for arguments were highly

diverse across the problems and between individuals. Therefore, we were not able to

make conclusive assertions regarding the types of arguments that students found more

appealing, nor were we able to distinguish the features of the arguments, as pre-identified
85
by the researcher, that could have significantly impacted the students’ evaluation of the

arguments by comparing those who had received high and low ratings. Since students’

judgment was made upon their personal standards of each argument, we believed there

were hidden factors that had influenced their judgment. In order to further investigate

those factors, we relied on follow up interviews to understand the rationale behind

students’ judgment as indicated in the survey results. In particular, we examined the

sources that students drew from to make evaluations, which were triggered by particular

contexts and arguments.

Each subject was interviewed separately and each interview lasted approximately

an hour. Each interview consisted of three parts (see Table 5). During the first part, the

subjects were provided with the same problems that were used on the survey, however in

a different format. The subjects had the conjecture as well as each argument of the

problem on a separate piece of paper. Different problems were printed on paper with

different colors. The subjects were asked to read the conjecture again and to then rank the

problems according to how convincing they found the arguments. The subjects were

allowed to change their ranking of arguments at any time during the interview. We did so

to make sure the subjects’ list was not offered randomly but after a careful consideration.

86
Subject Interviewer

First Reexamine each problem and rank Ask the subject why he/she believed
part arguments based on how convincing one argument was more convincing
they were to them. Explain the than another.
rationale of the arrangement by
explicit comparison between
arguments in the same context.

Second Compare the rankings across the Identify the inconsistency in subject’s
part problems. Confirm or revise the rankings across the contexts and
arrangement. Explain the differences explicitly point it out. Ask the subject
between arguments in different to explain how he/she viewed the
contexts in justifying the rankings. same types of argument differently in
different contexts.

Third Rank arguments for the new problem Ask the subject why he/she believed
part (See Figure 10) according to how one argument was more convincing
convincing they found the arguments. than another. Compare subject’s
Explain the rationale of arrangement responses for the new problem to
again. his/her previous answer and probe an
explanation from the subject.

Table 5. Overview of the interview process

It has been suggested that people usually find it more difficult to reflect on their

own thoughts (Tarricone, 2011). However, by asking the subjects to justify their selection,

their explanation could reveal factors that had impacted their preference. In addition to

87
what the subjects offered, we selected the following items as backup questions in case

they remained quiet or didn’t provide explanations that were understandable to us.

 Do you think that one of the arguments is wrong?

 Do you think that one of the arguments can only prove the conjecture is true

for some cases instead for all cases?

 Do you think that one argument helps you understand the problem better?

 Do you think that one argument offers better evidence?

 Do you think that one argument’s evidence cannot support its conclusion?

Throughout the interviews, the subjects were encouraged to explain their thoughts

as they felt inclined to do so. Furthermore, if their answer to a question was yes without

comment, we asked them to elaborate on their responses. Students’ responses to these

questions allowed us to identify their conception of the argument according to the CCIA

framework.

During the first part of the interview the subjects were asked to compare

arguments in each context. During the second part, we asked students to compare the

arguments across the contexts. We were interested in whether the subjects would modify

the order after such a comparison. We were also interested in learning whether the

subjects from the consistent group would act differently from representatives from the

inconsistent group. Most importantly, we wanted to know how the subjects justified their

preference when diversity existed in the types of arguments their preferred (e.g. a subject

preferred an empirical argument in one problem while ranked it as the least convincing in

another problem). Their explanation again revealed factors and features they considered

88
as important when making judgment of mathematical arguments and how such factors

and features might vary across the contexts.

During the last part of the interview, a new problem similar to those on the survey

problems was given to each subject (see Figure 10). The new problem required basic

understanding of elementary probability and proportional reasoning. The four arguments

used (E1 – E4) were inductive, visual, perceptual and algebraic, respectively. The

subjects were again asked to rank the arguments according to how convincing they were

in justifying the conjecture. Comparing the ranking provided for Problem E to their

responses to the previous four problems, the subjects were asked for the last time to offer

rationales for their decisions.

89
PROBLEM E

There are some white and orange ping-pong balls in a box. You cannot see what’s inside
the box but you will get a reward if you pick out an orange ping-pong ball from the box.
Jenna claims that:

“If the number of white ping-pong balls and the number of orange ping-pong balls
are both doubled, the chance for you to get a reward still stays the same.”

Arguments E1 - E4 are offered by different people to justify Jenna’s claim.

************************************************************************

Argument E1: Suppose there are 2 orange ping-pong balls and 3 white ping-pong balls
in the box, then the chance for you to get a reward is 2 out of 2+3, which is 40%.
If the numbers of ping-pong balls of each color are both doubled, then there will
be 4 orange ping-pong balls and 6 white ping-pong balls. Hence the chance for
you to get a reward is 4 out of 4 + 6, which is also 40%. Therefore, the chance of
winning the reward won’t change.

Argument E2: As shown in the figure below, if the numbers of orange and white ping-
pong balls are both doubled, the ratio between the ping-pong balls of the two
colors will still be the same. Therefore, the chance of winning won’t change.

… …

… …

Argument E3: When the number of orange ping-pong balls is doubled, the cases for
winning the reward are also doubled. However, when the number of white ping-
pong balls is doubled, the cases for not winning the reward are also doubled. As a
result, the ratio of the cases of winning to the cases of not winning stays the same.
Therefore, the chance of winning won’t change.

Argument E4: Suppose there are n orange ping-pong balls and m white ping-pong balls
in the box, then the chance for you to get a reward is n / (n + m). If the numbers of
ping-pong balls of each color are both doubled, then the chance for you to get a
reward becomes 2n / (2n + 2m), which is equal to n / (n + m). Therefore, the
chance of winning the reward won’t change.

Figure 10. The additional problem used in interview

90
Data Analysis

Data analysis followed two phases, consisting of quantitative analysis of the

survey results and qualitative analysis of the interviews. Outline of the data analysis

process is included in Table 6.

Survey Data Cumulative data were used to identify the type of arguments that were
Analysis understandable, convincing, explanatory or appealing to the entire group
of participants.
Between subgroup comparisons were conducted to investigate between
subgroup differences and possible causes.
Interview Each subject’s responses in the interview were coded and factors that
Data Analysis impacted the individual’s decision were identified.
Common factors that impacted each individual’s evaluation were
summarized and the individual differences were investigated through
between subject contrasts.
The subjects’ responses were revisited and summarized by problem. The
context’s impact on students’ decision was explored.
Survey results were revisited. Explanations to unexpected findings and
proposed hypotheses about the survey data were provided based on the
interview analysis.

Table 6. Outline of data analysis process

Survey Data Analysis

Quantitative analysis of the survey results focused on answering the following

questions:

91
 Which argument in each problem was indicated understandable by the most

participants?

 Which argument in each problem was indicated by the most participants as

being sufficient to show the general validity of the conjecture?

 Which argument in each problem was indicated by the most participants as

being clear to explain the validity of the conjecture?

 Which argument in each problem was indicated by the most participants as

being closest to what they preferred to use when encountering the same

conjecture?

 Were the answers to the four questions above consistent for each problem?

 Were the participants’ ratings consistent across the problems when judging

the same type of argument?

The participants’ responses to each of the SMR’s items were summarized to

answer these questions. For example, in order to determine what argument in Problem A

was indicated understandable by the most participants, we calculated and compared the

percentages of those who answered “agree” and “disagree” to the first question under

Arguments A1 – A4 (i.e. “You understand the concepts and notations used in the

argument”). The more participants answered “agree” and the less answered “disagree” to

this question under A1, the more understandable we considered A1 was. Since the

questions listed above were directly related to the SMR items, a cumulative summary of

the participants’ responses was enough to provide an answer to each of them.

We recognized that solely relying on documenting the cumulative data was not

sufficient to identify whether the difference of the evaluations between two arguments
92
were significant. For example, there might be more participants who found A1 to be

understandable than those who answered “agree” to the same question under A2,

however without clarifying whether the scale of the additional participants was

significant, it would be premature to claim that A1 was considered more understandable

than A2. Therefore, tests of significance of the differences between the accumulative

percentages were used to clarify any claims regarding the survey data. In particular, we

adopted within group ANOVA tests to examine the significances of the differences in

students’ responses to different arguments.

In addition to the analysis of the results from the entire group, we also examined

the data according to specific subgroups of the participants. We suspected that

comparison between responses according to particular subgroups of students might

enable us to associate the inherent differences between the subgroups and to identify

factors that had influenced the participants’ evaluation of mathematical arguments. We

managed to examine whether students who achieved higher scores on state standardized

mathematics tests demonstrated greater maturity in mathematical reasoning as measured

by SMR. In addition, data were also analyzed according to gender. The rationale was that

since the male and female students were enrolled in the same schools and in the same

classrooms taught by the same teachers using the same teaching materials and techniques,

the differences in their responses might be attributed to non-mathematical experiences.

The techniques used in the between subgroup comparisons were similar to what was done

when analyzing the entire group’s responses. We examined and compared each

subgroup’s responses to each question in the SMR and adopted the between group

ANOVA to evaluate the significance of differences. During the analysis of survey results
93
from the entire group and from various subgroups, conjectures were also made to explain

why certain arguments received higher ratings from the participants based on the

researchers’ understanding of the context, curriculum, and sample population. Evidence

to support or decline those conjectures were sought after during the interview analysis.

Interview Data Analysis

Summary of the survey responses revealed the participants’ judgment of and

preference for mathematical arguments at the macro level; however they were not

adequate to explain why an individual had made certain decisions when completing the

survey. The latter was the focus of interview analysis. In examining the interview

responses, we first identified both positive and negative comments made by each

individual subject in explaining why an argument was convincing to him/her. These

comments were summarized in a table as raw data. The comments were then coded using

the CCIA framework (see Figure 7). Specifically, we identified if the comments referred

to the representation, evidence, or the link between evidence and conclusion. We then

calculated the frequency of occurrence of comments on representation, evidence and link

to make conclusion about the subjects’ attention in making decisions to determine which

had the largest impact on each subject’s decision. Furthermore, we traced the type of

representation, evidence or link contributed to students’ evaluation in a positive/negative

way by counting the how many times they were referenced by the subject in the

explanation. The frequency of references to these elements served to identify factors that

substantially impacted the subject’s judgment. In addition to the elements conceptualized

in the CCIA framework, it was expected that other factors that had impacted the subjects’
94
decision would be detected during the coding process. Those factors were considered as

personal standards and were specially studied. Below we offer an example to illustrate

the process of analysis on the interview of a subject.

The case of Allen

Allen’s case serves as an illustrative example of the interview analysis technique

used when examining the interviews. The same analytical model was utilized in all other

7 cases. The proceeding discussion provides details of the procedure about how Allen’s

responses during the interview were transformed into an analyzable form, how this form

was coded, and how the coding was interpreted to understand the rationale of his

evaluation of mathematical arguments.

Allen was an 8th grade student enrolled in an Honors Algebra I class at the time of

data collection. In his responses to SMR, the visual arguments were indicated to be

closest to how he would argue in all but Problem C, where he selected C1, the inductive

argument. Based on these results, we believed that Allen had exhibited preference

towards visual arguments. Therefore, he was considered to be a representative from the

consistent group. The discussion of Allen’s performance in the interview includes both

data report and data analysis. In data report, we describe the interview process in detail,

including how he ranked the arguments according to how convincing they were to him

and how he justified his rankings. In data analysis, we identify factors that seemly

influenced Allen’s evaluation of the arguments based on his comments on and rankings of

the arguments.

95
Ranking arguments

In the first part of the interview, Allen was asked to work on Problems A through

D one by one. He (as well as other subjects) decided which problems to work on first and

next. However the decision wasn’t based on the content of the problem but the color of

the paper on which it was printed. Table 7 illustrates the rankings provided by Allen for

each problem. Column One of the table represents the order of problems that he tackled.

Most convincing -------------------------> Least convincing


Problem C C2 (algebraic) C4 (perceptual) C3 (visual) C1 (inductive)
Problem B B3 (algebraic) B4 (visual) B2 (perceptual) B1 (inductive)
Problem D D4 (visual) D1 (inductive) D2 (algebraic) D3 (perceptual)
Problem A A4 (visual) A2 (algebraic) A3 (perceptual) A1 (inductive)
Problem E E2 (visual) E4 (algebraic) E1 (inductive) E3 (perceptual)

Table 7. Rankings of arguments provided by Allen

Allen chose to start with Problem C. From the most to the least convincing, Allen

ranked the argument as C2 (algebraic) - C4 (perceptual) - C3 (visual) – C1 (inductive). In

explaining why he put C2 at the top of the list, Allen suggested that “it uses formulas

which I know are fact, and I like seeing fact.” Later, he repeated a similar comment “this

one has a formula, which I love.”Additionally, he called C2 as the “simplest, quickest,

most effective way” and “very straightforward.” In explaining why he considered C4 less

convincing than C2 but more convincing than the other two arguments, Allen suggested

that C4 was convincing because “it still uses sides and areas.” However, what made him

96
to consider it less convincing was because “it’s less straightforward.” In addition, he

suggested that he had never done something similar to what was described in C4 but he

could imagine the scene. In particular, he suggested that the argument “clearly states, if

you’re using a wired outline which is the figure, I can picture that in my mind, I know

exactly what they’re talking about.” In explaining why he put C1 at the bottom of the list,

Allen suggested that he viewed the “wording" of the problem problematic. For example,

he suggested that “it’s trying to relate too many things: a = b = c,” and “it trip me up for

the first few seconds.” Although the argument contained “a formula,” which he did like, it

was “not straightforward enough.” So this was the only argument in the problem that he

actually didn’t like. In explaining why he put C3 low on the list, Allen claimed that he

liked C3 since “it uses the length of the sides, which is what I would use any day of the

week.” He also liked the fact that “it has diagrams… which explains what they’re talking

about there.” However, when comparing C3 to C2 and C4, Allen only repeated what the

reasons for which he ranked C2 and C4 high on the list but didn’t specify why he didn’t

consider C3 as convincing.

After justifying his order in Problem C, Allen continued to work on Problem B.

From the most to the least convincing, he ranked the arguments as B3 (algebraic) – B4

(visual) – B2 (perceptual) – B1 (inductive). He chose to first explain why he didn’t think

B1 was convincing. In particular, he suggested that he was a visual learner but he wasn’t

“seeing any visual representation.” He considered the argument as a mere “opinion” and

suggested “there’s no other facts.” So “there’s not enough support for me, compared to

the other ones.” In evaluating B2, Allen suggested that “it does give a clear example”

which he did believe. So it was better than argument B1. However “it is still not my
97
favorite.” In evaluating B4, Allen suggested that “it shows a circle there, I do like these, I

can clearly relate to these.” He further explained his understanding of the details of the

argument, “I can see with, I guess you can call them formulas, bc here is the length of the

side of the rectangle, and it is smaller than bp, just by a little bit there, so I can believe it.”

He indicated that he liked B4 and B3 “very closely.” Lastly, Allen articulated that C3 was

his favorite since “I like the fact of using the Pythagorean theorem. It’s more

straightforward than using all the angle and the side relations.” He also suggested that by

using the Pythagorean theorem, he could “figure out the problem in a minute or so.”

The next problem Allen worked on was Problem D, where he ranked the

arguments, from the most to the least convincing, as D4 (visual) – D1 (inductive) – D2

(algebraic) – D3 (perceptual). He suggested that D4 was his favorite since “I like the

visual aid again.” When asked to explain his understanding of the graph, Allen pointed at

the graph and stated, “seeing that after and before are always 1 separated there, 1

separated there, and it never changes, since they’re parallel, so using those, I do think that

that is the best saying that you can always save an extra dollar.” When commenting on

D1, Allen expressed that it was “a little wordy, and that’s why I put it the second.”

However, he expressed that he liked “the fact that they used other, that they can also plug

in a price here, as well as using a formula and plugging a price in.” He claimed that he

liked the argument since he was “a formula kinda guy.” It was interesting that D1 didn’t

contain any formulas, instead there were equations to calculate the results. However,

Allen was able to conclude that replacing a particular value in the equation wouldn’t

change the outcome. Therefore he considered the equations as formulas. While

evaluating D2, which had the actual formula, Allen said that he didn’t think “there was
98
too much difference.” However, he found D2 to have “more formulas” and “less

explanations.” He found D1 to be more “straightforward” and “it offered an example.”

Therefore, he considered D1 to be more convincing that D2. In evaluating D3, Allen

suggested that “that is more business, it’s not applying directly to math.” He further

explained that “just stating that and not giving that much evidence, it’s not very

convincing to me.” Therefore, he didn’t consider D3 convincing.

When assessing Problem A, Allen provided the rank as A4 (visual) - A2 (algebraic)

- A3 (perceptual) - A1 (inductive). He started justifying his ranking with A1, stating that

he didn’t see “very many supporting arguments.” He suggested that “there’s almost no

mathematical evidence here, except the opinions and personal work of other people doing

math, and not showing what they did.” Therefore, he considered A1 the least convincing.

Allen considered A3 to be more convincing than A1 since “it says what you can do to

figure it out.” However, since there was “no formula, or visual representation,” he didn’t

consider it as convincing as A2 and A4. To clarify, Allen claimed that in A3, “there is

proof; it’s just not solid, like always a formula.” In evaluating A2, Allen first substituted

a number, 3, to verify if the formula was correct. He then suggested that “it doesn’t

matter” which number he tried, however he needed to check “3 or 4 different ones” to be

sure about the result. In the evaluation of A4, Allen argued that it had “very simple visual

representation,” so it justified the conjecture “clearly, and in my mind effectively.”

In the second part of the interview, Allen was asked to revisit Problems A through

D and compare the ranking he had provided. He was first asked why he considered A1,

B1 and D1 to be the least convincing argument in the each respective problem. He

explained that those arguments did provide some examples but were more like “opinions
99
and people doing things that I have not personally seen.” He didn’t think such examples

were as convincing as those backed up by theorems and graphs. Allen was asked to

justify why he considered D1 convincing since it also showed just a few cases. He

responded that “there’s always the showing, they’re working it out,” which was better

than “plainly stating what they had tried.” Furthermore, Allen was asked if A1 was be

more convincing to him if it had provided more details of the checking procedure. He

replied yes to this question and offered that “giving concrete numbers and facts and

stating their observations of what they did the experiment on” would made it a better

argument.

In addition to the evaluation of inductive arguments, Allen was asked to explain

why he considered the algebraic and visual arguments convincing in all problems. He

stated that formulas and diagrams made arguments more clear and “if there’s a

combination of visual diagrams and formulas, that would be fabulous, that would be

perfect.” Furthermore, in justifying why he considered the perceptual arguments less

convincing, Allen reasoned that “simply saying to imagine it, then stating that it’s

definitely longer, you’re not giving any example.”

During the third part of the interview, Allen was asked to examine Problem E and

rank E1 – E4 according to how convincing they were. His rank was: E2 (visual) - E4

(algebraic) - E1 (inductive) - E3 (perceptual). This rank was consistent with the rank he

provided in Problems A – D, where visual and algebraic arguments were generally

considered more convincing than inductive and perceptual arguments. In evaluating E2,

Allen proposed that “immediately when I noticed the graph I know it will be high

ranking.” However the interviewer soon realized that Allen didn’t actually understand the
100
graph. Allen was allowed to reexamine the argument but he still couldn’t explain how the

graph supported the conjecture. Therefore, the interviewer explained what the graph

meant, in particular how it represented the “doubling” procedure stated in the problem.

This episode suggested that Allen’s preference for visual illustration might not be based

on a careful analysis of the argument. Instead, he might have been attracted to it due its

appearance. It was unclear if this was an isolated case. In fact, his understanding of the

graphs in Problems A though D was also checked by the interviewer and no

misunderstanding was revealed at those times.

Another interesting finding was that Allen actually found it difficult to explain his

understanding of the graph in E4. So he chose to refer to E4 (algebraic) and used the

symbols to describe his ideas. This case demonstrated that Allen was comfortable in

using letters to represent variables in mathematical contexts. Allen further suggested that

E2 and E4 “are in principle the same,” but that he still preferred E2 to E4 since E2 “ is

still stating that clearly, while giving me the visual.” Allen added that he in fact liked all

the arguments. However, he considered E1 and E3 less convincing since E1 “provides an

example, not an opportunity to provide your own examples,” while the description

offered in E3, although “in principle the same” as what was offered in E4, was less

appealing to him than visual and algebraic representations. This explanation revealed that

the representation of an argument had a clear influence on Allen’s evaluation of it.

Note that in both Problems E and D, Allen claimed that the inductive and

algebraic arguments were similar, however, he considered the inductive argument E1 to

be less convincing in Problem E, while he found the inductive argument D1 to be more

convincing in Problem D. In explaining this inconsistent selection, Allen expressed his


101
preference for formulas, “I love formulas, which are always in my mind second to visual

representations.” He claimed there were formulas in D1, D2, E1 and E4 even though the

text in D1 and E1 didn’t contain any formula (there were numerical equations instead).

This again demonstrated that Allen seemed to be able to conceptualize the formula by

looking at the equations. It was interesting that when evaluating D1, Allen expressed that

he “can see the formula,” while for E1, he claimed that “this is not straightforward

because it only gives one example” and “there would be a formula here, but it’s not

stated.” When evaluating D2, Allen suggested that “this is not straightforward, because it

is a longer and more complicated and not straightforward enough formula.” When

evaluating E4, he characterized that it “straightforward giving you the formula there,

instead of providing two examples.” That is, he used double standards when evaluating

arguments containing algebraic formulas. In his view, the arguments needed to be

“straightforward” to be convincing, and whether he preferred arguments with numerical

equations or algebraic formulas depended on whether the formulas he saw in those

numerical equations were more “straightforward” than the given algebraic formulas. It

was unclear the standard of being “straightforward” meant, however we suspected the

complexity of the formula and his familiarity with it might have been two important

factors.

Allen’s comments on the arguments used in the interview were summarized in

Table 8. Each comment was then characterized using a coding strategy in line with the

theoretical constructs of CCIA framework (see Figure 7).

102
Positive Comments Negative Comments
Problem C
It uses formulas which I know are fact, and It’s not straightforward enough. (P)
I like seeing fact. (E4, R4)
[Formula is the] Simplest, quickest, most It’s trying to relate too many things: a = b
effective way. (P) = c. (P)
Very straightforward. (P) If I’m trying to figure this out for the first
time, I wouldn’t think that a, it doesn’t
equal that, and that it doesn’t equal that,
and that would like, trip me up for the first
few seconds. (E2)
I would like to see a diagram… I’m a It is not clearly outlined. (P)
visual learner. (E2, R1)
It is a formula, which I do like. (E4, R4) It’s trying to relate too many things. (P)
It uses the length of the sides. (E4)

Any area or angle problem, I would


always use corresponding sides and
angles. (E4)
This one has formulas, which I love. (E4,
R4)
It has diagrams… which explains what
they’re talking about there. (E2, R1)
It clearly states, if you’re using a wired
outline which is the figure, I can picture
that in my mind, I know exactly what
they’re talking about. (E3, L2)
Problem B
It shows an example. (E2) I’m not seeing any visual representation.
(R1)
It does give a clear example. (E2) It’s really opinion; there’s no other facts.
(E6-)
It shows a circle there, I do like these, I There’s not enough support for me,
can clearly relate to these. (E2) compared to the other ones. (E3-, L2-)
I can see with, I guess you can call them
formulas, bc here is the length of the side
of the rectangle, and it is smaller than bp,
just by a little bit there, so I can believe it.
(E4, L4, R1)
Continued

Table 8. Summary of comments made by Allen

103
Table 8 continued
Positive Comments Negative Comments
I like the fact of using the Pythagorean
Theorem. It’s more straightforward than
using all the angle and the side relations.
(E4, P)
It also has a formula that I can work out by
myself and see the process of doing it. (E4,
L4)
Problem D
I like the visual aid again. (R1) It is a little wordy. (R2-)
Seeing that after and before are always 1 I like straightforward ones. (P)
separated there, 1 separated there, and it
never changes, since they’re parallel, so
using those, I do think that that is the best
saying that you can always save an extra
dollar. (R1)
I like the fact that they used other, that [There’s] less explaining. (P)
they can also plug in a price here, as well
as using a formula and plugging a price in.
(E2, L5)
It’s a formula, I’m a formula kinda guy. Just stating that and not giving that much
(E4, R4) evidence, it’s not very convincing to me.
(E6-)
I like the fact that it’s always constant, you
can plug any value in. (E2, L4)
More explaining, as well as they use
examples here. (E2, R2)
Problem A
It says what you can do to figure it out. I’m not seeing very many supporting
(E3, L2) arguments. (E4)
Very simple visual representation. (R1) There’s almost no mathematical evidence
here, except the opinions and personal
work of other people doing math, and not
showing what they did. (E4, E6-)
No formula… or visual representation…
(R1, R4)
There is proof, it’s just not solid, like
always a formula. (R4)
continued

104
Table 8 continued
Positive Comments Negative Comments
Comparing Problems A-D
When someone is trying to convince me of Opinions and people doing things that I
something, I would like facts. (E4) have not personally seen. (E6-)
Formulas and diagrams. (E4, R1, R4) There’s no solid, stated proof right there is
not as convincing as a line graph, or the
Pythagorean theorem, or those. (E4, R1,
R4)
There’s always the showing, they’re Just plainly stating. (E6-)
working it out. (P)
Giving concrete numbers and facts and Simply saying to imagine it, then stating
stating their observations of what they did that it’s definitely longer, you’re not
the experiment on. (E2, R3) giving any example. (E2, E6-)
If there’s a combination of visual diagrams This argument is based on shapes and
and formulas, that would be fabulous, that common sense, but not explanatory sense.
would be perfect. (E2, E4, R1, R4) (E6-)
Problem E
This is still stating that clearly, while Just given that information and no outside
giving me the visual. (R1) knowledge that that would work for all
cases. (E6-)
It does explain it clearly. (P) It provides an example, not an opportunity
to provide your own examples. (E2)
An opportunity to provide your own
examples. (E2)
Work it out on my own, and find out more.
(E2)
Additional Comments
I love formulas, which are always in my It’s not as clear. (P)
mind second to visual representations. (E4,
R1, R4)
This is more straightforward. (P) It only gives one example. (E2)
OK, because it gives examples that This is not straightforward… because it is
worked. (E2) a longer and more complicated and not
straightforward enough formula. (P)
I would still have the opportunity, because There would be a formula here, but it’s not
it’s a formula, to provide my own stated. (R4)
examples. (E2, R4)
Straightforward giving you the formula
there. (E4, R4)
I can see the formula. (E4, R4)

105
Data Analysis

As shown in Table 7, Allen demonstrated a clear preference towards arguments

backed by visual illustration and formulas during the interview. He chose visual

illustrations as the most convincing arguments in Problems A & D, and selected algebraic

arguments as the most convincing arguments in Problems B & C. In addition, he

considered the inductive arguments as the least convincing arguments in Problems A, B

& C, believing that those arguments didn’t offer enough support to prove the conjecture.

Allen’s general preference toward visual arguments was consistent with his responses in

the SMR.

In order to identify the factors and features of the arguments that had impacted

Allen’s evaluation, his comments during the interview were coded according to the CCIA

framework (see Table 9). We did so to identify whether each of his comments referred to

the representation, evidence, or the link between evidence and conclusion 7 (which was

denoted by the Capitalized letter, R, E, and L, respectively) and the kind of representation,

evidence and link (which was denoted by a single digit following the letter) that had

positively/negatively impacted the subjects’ evaluation of the argument. The coding was

included in Table 8 at the end of each comment. For example, the comment that “it uses

formulas which I know are fact, and I like seeing fact” was coded “E4” and “R4” (see

Table 8, third line from the top), since it was based on a mathematical fact as evidence,

which was expressed in a symbolic form. According to the CCIA framework, this type of

7
“Link between evidence and conclusion” of an argument is referred as “link” of the argument for convenience in the

discussion.

106
evidence is considered “fact,” which is coded “E4” as listed in Table 9. The

representation is considered “symbolic,” which is coded “R4.”

Representation Evidence Link


Visual: R1 Authority: E1 Direct: L1
Narrative: R2 Example: E2 Perceptual: L2
Numerical: R3 Imaginary: E3 Inductive: L3
Symbolic: R4 Fact: E4 Transformational: L4
Assumption: E5 Ritual: L5
Opinion: E6 Deductive: L6
Note:
i). “P” denotes comments that didn’t refer to the representation, evidence, or link of
arguments. These comments were analyzed separately from others.
ii). “NA” denotes comments that the subject claimed that he/she didn’t understand the
argument and didn’t offer any explanation.
iii). A notation “-” was added behind the code to indicate that this factor made the
argument less convincing to the subject.

Table 9. Table of codes

The following points are important in understanding the coding procedure.

1) Not all comments could be coded according to the CCIA framework. In cases

where factors that contributed to the judgment were not identifiable or were not about the

type of representation, evidence, or link of an argument, we coded them as “P,” denoting

that there were personal standards that need to be further examined. We called it personal

standards since those reasons might not be associated with any particular type of

107
argument. For example, the comment “it’s not straightforward enough” was coded “P”

since it could apply to many different types of arguments. There were also cases when the

subject indicated that he/she was not able to understand an argument. We use “NA” to

denote such comments, suggesting that the subject was unable to provide an evaluation

for the argument.

2) A certain factor could make an argument more/less convincing to the subject.

To identify the different effect, a “-” was added to the end of a code if the identified

factor made the argument less convincing. If there was no such a mark after a code, then

the corresponding factor made the argument more convincing to the subject.

3) A comment could refer to different features or factors of an argument. In this

case, it was classified using multiple codes. For example, the comment that “it’s a

formula, I’m a formula kinda guy” referred to the formula as the mathematical evidence

as well as a symbolic representation. Hence it was coded both “E4” and “R4.”

4) There were cases where it was difficult to judge what exactly an argument

meant merely based on the text of the comment. In this case, we reexamined the dialogue

in the recorded video to determine the contextually embedded meaning of a comment.

For example, only by reading the comment that “I’m not seeing very many supporting

arguments,” we were not able to understand what exactly the “supporting arguments”

meant. However, by fitting this comment in the conversation, we identified that Allen

referred the formulas and mathematical facts as what he called “supporting arguments.”

Therefore, we assumed that this comment referred to a mathematical fact as evidence,

which was coded as “E4.”

108
5). Similar comments might have been mentioned by Allen in multiple places

during the interview. Such comments were counted multiple times. The assumption to do

so was that if a point was addressed multiple times, it should be viewed as being more

significant than other opinions to the subject.

The codes for Allen’s comments were then summarized to examine his evaluation

criteria. Table 10 illustrates the result.

Total number of references to representation: 27


Visual Narrative Numerical Symbolic
Positive 12 1 1 12
Negative 0 1 0 0

Total number of references to evidence: 47


Authority Example Imaginary Fact Assumption Opinion
Positive 0 18 2 17 0 0
Negative 0 1 1 0 0 8

Total number of references to link: 7


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 2 0 3 1 0
Negative 0 1 0 0 0 0

Table 10. Categories of comments made by Allen

As shown in Table 10, the total number of comments that focused on the

representation, evidence and link of the arguments were 27, 47, and 7, respectively,

109
indicating that the evidence of arguments had the largest impact on Allen’s judgment.

Among all types of evidence, Allen found that fact (i.e. known mathematical results) and

examples (i.e. results from an immediate test) as reliable source to establish an argument,

each of which was referred 17 and 18 times. His explanation was heavily rooted in the

discussion of specific mathematical concepts (e.g. specific numbers’ properties, specific

geometric properties, meaning of graphs, etc.) instead of personal assumptions or

opinions. This was highlighted by his claims that “when someone is trying to convince

me of something, I would like facts” and “giving concrete numbers and facts and stating

their observations of what they did the experiment on” would make an argument

convincing. In addition, he clearly emphasized that “opinions and people doing things

that I have not personally seen” didn’t make an argument convincing to him. Similar

statements were mentioned for 8 times during the interview. Overall, Allen’s comments

demonstrated his need of seeing specific and concrete evidence to be convinced.

The representation of arguments also influenced Allen’s judgment. In particular,

he indicated that visual and symbolic representations contributed to his conviction, each

was noted 12 times during the interview. Allen claimed that he loved “formulas, which

are always in my mind second to visual representations.” He also suggested that if

“there’s a combination of visual diagrams and formulas, that would be fabulous, that

would be perfect.” This tendency was backed up by his capability to represent variables

with symbols and manipulate the symbols fluently, as well as the capability to connect

graphs to the content of the problem.

However, Allen’s algebraic skills didn’t enable him to evaluate the logic used to

connect evidence and conclusions. Among all the comments he made, 7 referred to a
110
certain type of link between evidence and the conclusion of an argument. In 2, 3, and1

cases, respectively, Allen found a perceptual, transformational, and ritual link convincing.

We found Allen was not able to recognize that showing a few examples couldn’t prove a

conjecture is always true. He considered an argument convincing “because it gives

examples that worked.” Nevertheless, this didn’t suggest that Allen’s mathematical

reasoning ability was underdeveloped. In fact, we argue that the ability to examine a

single case carefully was a required step for a further conceptualization of generic

examples (Balacheff, 1988). We had noticed that Allen was capable of extracting

properties he saw in one example and applied them to other cases. This was demonstrated

when he claimed “I like the fact that it’s always constant, you can plug any value in” after

examining a few cases.

In addition to the features/factors identified by CCIA, Allen had personal

standards for deciding whether an argument was convincing or not. There were 14

comments that were coded as “P” (see Table 8). In particular, 9 of these comments

concerned the simplicity of an argument, using terms such as “straightforward,” “simple,”

and “quick” to explain why he was or was not convinced, while the other 6 comments

referred to the clarity of the arguments (e.g. “There’s always the showing, they’re

working it out.”). In fact, we found that the pursuit of simplicity and clarity overrode his

preference for the type of representation and evidence. This was demonstrated by his

claim that “this is not straightforward… because it is a longer and more complicated and

not straightforward enough formula.” That means, in order for formulas, one of his

favorite types of evidence and representations, to be convincing in an argument, they

need to be “straightforward” and not too “complicated.”


111
Combining Allen’s personal scheme and those characterized by the CCIA

framework, we obtained a clearer picture of Allen’s rationale for evaluating mathematical

arguments (see Figure 11). Allen viewed arguments that utilized precise description and

involved simple reasoning procedure as convincing. To him, known mathematical facts

and concrete examples were the most straightforward source of evidence, while the visual

and symbolic representations were the clearest ways to describe and relate those

examples. However, since Allen was not yet able to reflect on the rigidness of logic

embedded in an argument, the type of link between the evidence and conclusion was not

among his major focuses. Arguments that used transformation, perceptual and ritual link

were all perceived as convincing by him. An argument was convincing to him as long as

the reasoning looked “straightforward” to him, regardless of it logical rigidity.

Ritual
Perceptual
Transformational Examples, Facts
Convincing
arguments
Simple procedure (Visual, Symbolic)
Precise description

Figure 11. Illustration of Allen’s rationale for evaluating mathematical arguments

With this platform, Allen’s rankings of the arguments (see Table 7) became more

sensible. In Problem C, the clarity of the evidence provided in each argument determined

their ranking. The evidence provided C2, C4, C3, and C1 was the triangle area formula,

112
the imagery triangle made by wire, the drawn triangle within a transformation process,

and a collection of triangles, respectively. Among all these, the formula was the most

simple and clear; the imagery triangle made by wire was less clear but also very simple;

the triangle within a transformation process looked more complex; while the collection of

triangles offered a mix pond of information and “trip me (Allen) up for the first few

seconds.” This explained Allen’s rankings of them.

Arguments in Problem B were also ranked based on the simplicity and clarity of

the evidence provided by them. Compared to his ranking for Problem C, the only

difference was that the rankings of the visual and perceptual arguments were switched.

Allen’s explanation was that the image of the triangle made by wire was clearer than the

image of a football field. Therefore, the argument based on the football field scene was

not less convincing to him.

In the other three arguments, Allen found the visual arguments to be the most

convincing option while the algebraic arguments were ranked lower. A possible

explanation was that in Problem C and B, both algebraic arguments contained well

known mathematical facts (triangle area formula and the Pythagoras Theorem); however,

in Problems D, A and E, the algebraic expressions were not known results but were used

to represent the variables’ relationship in the problem. Therefore, the absence of clear and

simple evidence made them less convincing to him.

The different rankings of the inductive arguments across the problems could also

be explained. Notice that in Problems A, B and C, the inductive arguments were

considered the least convincing. This was because there was no concrete example given

in A1 and B1, while in C1, the examples might seem confusing to him. However, since
113
D1 and E1 discussed more details about the examples, they were considered more

convincing.

Overall, we found that the pursuit of simple and clear statements, the need to see

mathematical facts and concrete examples, and preference towards visual and symbolic

representation could help explain Allen’s evaluation of mathematical arguments.

Cross comparison

Seven other subjects’ interview data were analyzed using the same process as

illustrated about. Details of these analyses are included in the next chapter. Following

each individual analysis, a cross comparison of data for all subjects was performed to

order to document the similarities and differences among their responses. In seeking the

similarities, we considered whether there were factors that consistently impacted all (or

the majority of) subjects’ decisions. We calculated the frequency of occurrence of factors

for all subjects and identified those most prominently referenced. We also contrasted the

coded summary of each subject’s comments to identify between individual differences in

terms of the elements identified in the CCIA framework and any additional personal

schemes detected during the coding process.

The final stage of analysis focused on exploring context specific factors that

influenced the subjects’ decisions. In particular, we examined causes for the inconsistent

rankings that were provided by the subjects to the same type of arguments. Those factors

were identified to explain how context influences the evaluation of mathematical

arguments. Details of the survey and interview results as well as findings from

quantitative and qualitative analysis are shared in the following chapter.

114
CHAPTER 4. RESULTS

This chapter is composed of two sections. The first section is dedicated to the

analysis of the survey data. The second section offers a discussion of the results of the

interviews.

Findings from SMR

We analyzed SMR responses from 476 eighth grade students. The survey results

suggested that students’ judgment of the same type of arguments were highly diverse

across the problems and between individuals.

Arguments understandable to students

The first step in the analysis process considered arguments that the participants

claimed to understand. In order to do so, we calculated the percentage of students who

responded “agree” to the question “You understand the concepts and notations used in the

argument” under each argument (see Figure 12). If a participant answered “agree” to the

question when judging a argument, we considered the argument understandable to the

participant.

115
Figure 12. The percentage of participants who considered each argument understandable

Whether the questions were understandable to students was critical in verifying if

the items were appropriate for this age group. As shown in Figure 12, most arguments

were reported to be understood by more than 58% of the participants.

We calculated the number of arguments that each respondent claimed to

understand. Figure 13 illustrates a summary of the participants’ responses. As shown, the

majority of the participants (80.5%) claimed that they understood more than half of the

arguments. We also generated a similar graph to illustrate the distribution of the number

of arguments that were identified as not understandable (not including “not sure”
116
responses, see Figure 14). As shown, those who had claim not having understood more

than 4 arguments counted for less than 15% of the participants. This data suggested that

the arguments / conjectures included in SMR were considered understandable by most of

the respondents. This also suggested that the SMR problems didn’t exceed the

participants’ self assessed mathematical ability so that their evaluation of the arguments

were less likely to have been made randomly.

70
62
60 56 54
49 51
50 44
39
40
28 28
30
19 21
20
12
10 4
2 2 2 3
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Figure 13. Distribution of the number of arguments indicated understandable by each


participant

117
160
136
140
120
100
80 83
80
58
60 49
40 34
16
20 6 6 4 0 2 0 0 1 1 0
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Figure 14. Distribution of the number of arguments indicated not understandable by each
participant

Higher percentages of participants indicated A1, B1, C3 and D1 as understandable

in each of the respective problems (see Figure 12). Among them, A1, B1 and D1 analyzed

the conjecture by examining a few examples; while C3 was a visual illustration of why

the conjecture was true. Among the second most understood arguments in each problem

(i.e. A4, B4, C4, and D3), A4 and B4 were visual arguments, while C4 and D3 argued in

a perceptual way. Among the least understood arguments in each problem (i.e. A2, B3,

C1&C2 (tie), and D2), all utilized symbolic representations and argued algebraically

except Argument C1, which was an inductive argument but involved plenty of symbols

and labeled figures. Argument C1 required a more careful realization of the connection

between figures, symbols and narratives, hence it may have been harder to understand

than inductive and visual arguments in other problems.

118
In order to clarify whether the differences in ratings were statistically significant,

we applied the within group ANOVA tests (i.e. repeated measures ANOVA) in the

analysis. To classify students’ responses to the corresponding questions, we assigned

numerical value “1” for “agree”, “-1” for disagree, and “0” for “not sure”. We call each of

these numbers a participant’s rating on whether an arguments was understandable. We

then adopted the within group ANOVA to test whether the students’ ratings (which were

considered within-subjects variables) on the arguments in a problem were significantly

different from each other. The results are included in Appendix A, Table 39.

Figure 15 summarizes the results of the analysis presented in Table 39. In

particular, the arguments in each problem were listed sequentially from the most

understandable (top) to the least understandable (bottom). In addition, two arguments

were connected using a curve if the ratings they received were not significantly different

from each other (p > .05). In an intuitive sense, if two arguments were connected, then

they were “close” to (not significantly different from) each other; if not connected, then

the two arguments were separated from (significantly different from) each other. For

example, in Problem D, from top to bottom, D1 received the highest rating; D3 received

the second highest; D4 came after D3; and D2 received the lowest rating. The differences

in ratings between D1 & D3, and D4 & D2 were insignificant so each pair was connected

using a curve. The differences in ratings between D1 & D4, D1 & D2, D3 & D4, and D3

& D2 were significant so each pair was not connected using a curve. As shown in Figure

15, D1 and D3 were considered significantly more convincing than D4 and D2.

119
Figure 15. Illustration of how understandable the arguments were to the participants

In Problem A, A1 (inductive) was significantly more understandable than A3

(perceptual) and A2 (algebraic); B1 (inductive) was significantly more understandable

than B3 (algebraic) and B4 (visual); and D1 (inductive) was significantly more

understandable than D2 (algebraic) and D4 (visual). These results again demonstrated

that students were more likely to understand an argument when it showed examples. In

addition, it was also detected that the differences between A1 (inductive) and A4 (visual),

and D1 (inductive) and D3 (perceptual) were not significant. This signaled that students

may have been able to understand other types of arguments as well as inductive ones. The

participants’ view on the inductive argument in Problem C was different from the other

three problems where the visual argument was considered as insignificantly less

120
understandable than the two others (perceptual and visual). Therefore, the results tended

to suggest that although in general inductive arguments seem to be easier to understand

by the participants, other types of arguments were well perceived by them when

satisfying certain conditions (e.g. the arguments may connect the problem with familiar

experience or offer visually appealing illustration). Exploring those conditions was

among the goals of the follow up interviews.

Arguments convincing to students

In addition to the ratings on how understandable an argument was, the

participants’ evaluation of whether an argument showed that the conjecture was always

true was also analyzed. Students’ judgments of the second claim under each argument (i.e.

“the argument shows that the statement is always true”, see Figure 8) were assessed using

numerical values: “1” for “agree”, “-1” for disagree, and “0” for “not sure”. We call each

of these numbers a participant’s rating on whether an arguments was convincing. Note

that if a participant didn’t indicate that he/she understood an argument, then we

considered him/her to be not sure if the argument was convincing. We then adopted the

within group ANOVA to test whether the students’ ratings (which were considered

within-subjects variables) on the arguments in a problem were significantly different

from each other. The results of statistical analysis of this data set is included in Appendix

A, Table 40.

Figure 16 summarizes the results presented in Table 40. Adopting a procedure

similar to that described in previous section, the arguments in Figure 13 were listed

sequentially from the most convincing (top) to the least convincing (bottom) by problem.
121
In addition, two arguments were connected using a curve if the ratings they received were

not significantly different from each other (p > .05). In an intuitive sense, if two

arguments were connected, then they were “close” to (not significantly different from)

each other; if not connected, then the two arguments were separated from (significantly

different from) each other. For example, in Problem D, all the arguments were connecting

using curves, indicating the difference between any pair of arguments was insignificant.

Figure 16. Illustration of how convincing the arguments were to the participants

Figure 16 shows that in Problem A, A4 (visual) was considered the most

convincing argument, while A1 (inductive) received the lowest rating. Note that the

differences between A1 and any of the other 3 arguments were significant; and the

differences between A4 and any of the other 3 arguments were also significant. This
122
suggested that the participants found the visual demonstration more convincing than any

other type of argument in the number theory problem, while they suggested that checking

a few numbers couldn’t convince them that the conjecture was always true. We suspected

that the figures used in A4, where manipulatives commonly used in mathematics

instruction were represented, might have contributed to the high rating on A4.

In Problem B, Argument B3, which adopted the Pythagoras Theorem, received a

rating that was significantly higher than any other argument; while B1 (inductive), again

the inductive argument, received a significantly lower rating than B3 (algebraic) and B4

(visual). The same as what was detect in Problem A, the inductive argument received the

lowest rating and the perceptual argument’s rating was not significantly higher. This

finding suggested that either an imaginary (i.e. the football field in B2) or a few concrete

examples as cited in B1 made them to be as well perceived as the visual and algebraic

arguments in this problem. In particular the participants found the algebraic argument

more convincing than all other arguments in this geometry problem. We suspected two

factors might have contributed to the high rating of B3. First we realized that Pythagoras

Theorem is one of the most well known result in school geometry and therefore strongly

recognizable by the students. Second we perceived that 8th grade students may have just

learnt the theorem and the topic was still fresh in their mind.

In Problems C, C2 (algebraic) received the highest rating, and it was rated

significantly higher than C3 (visual) and C1 (inductive) but was not rated significantly

higher than C4 (perceptual). Compare to Problems A and B, the differences between

ratings were smaller in the sense that the between argument differences were not

significant for many pairs. In particular, the ANOVA test suggested insignificant
123
differences among the ratings on C1, C3, C4. C2 stood out as being significantly more

convincing than two of the other three arguments. We suspected the well known triangle

area formula might contributed to its high rating. Similar to what was detected in

Problem A and B, this finding could be viewed as the mathematics curriculum’s impact

on the participants.

In Problem D, although D4 (visual) received the highest rating, it was not rated

significantly higher than any other arguments used in this problem. Therefore, all the four

arguments seemed to have been equally as convincing to the participants. This was a

good illustration that in some cases there might not be a unique most convincing

argument. Each argument might be convincing to a certain group of students so that

approaching a problem using multiple strategies might be the only plausible way to help

all students to understand why a conjecture was true.

Data from the survey suggested that the participants were not completely satisfied

with empirical checking and verifying. Among all the lowest rated arguments in each

problem, two were inductive. However it was premature to claim that the participants

were able to realize that checking a few example was adequate to prove the general

validity of an conjecture. We made this claim since among all the 476 respondents, only

10 could identify, in all four contexts, that the inductive argument was not sufficient to

establish the validity of the conjecture. At the same time, there were some indicators

implying that information other than pure empirical checking could have contributed to

students’ conviction in the process. However, it was not clear what type of information

was most helpful. As shown in the Figure 16, visual illustration (A4 & D4), theorem (B3),

formula (C2), mental image of real life experience (C4), closer examination of examples
124
(D1) could all contribute to a higher rating. Further investigation of how these various

types of arguments contributed to the participants’ conviction of the validity of the

conjectures was carried out during the interview phase of the study.

Arguments explanatory to students

The proceeding discussion concerns whether each argument helped the

participants to better understand why a statement was true. In doing so students’

judgments to the third claim under each argument (i.e. “the argument helps you better

understand why the statement is true”, see Figure 8) were assessed by assigning the

numerical values: “1” for “agree”, “-1” for disagree, and “0” for “not sure”. Each of these

numbers was called a participant’s rating on whether an arguments was explanatory. Note

that if a participant didn’t indicate that he/she understood an argument, we considered

him/her to be not sure if the argument was explanatory. We then adopted the within group

ANOVA to test whether the students’ ratings (which were considered within-subjects

variables) on the arguments in a problem were significantly different from each other.

The statistical results of the analysis are included in Appendix A, Table 41.

Figure 17 summarizes the results presented in Table 41. In particular, the

arguments in each problem were listed sequentially from the most explanatory (top) to

the least explanatory (bottom). In addition, two arguments were connected using a curve

if the ratings they received were not significantly different from each other (p > .05). In

an intuitive sense, if two arguments were connected, then they were “close” to (not

significantly different from) each other; if not connected, then the two arguments were

separated from (significantly different from) each other. For example, in Problem A, A4
125
was not connected with any other argument while the other three arguments were

connected to each other. This suggests that the rating on A4 (visual) was significantly

higher than the other three arguments, whose ratings were not significantly different from

each other. The results show that the participants found the visual demonstration more

explanatory than any other type of argument in the number theory problem.

Figure 17. Illustration of how explanatory the arguments were to the participants

The situation in Problem B was similar to that of Problem A, where B3 (algebraic)

received a rating that was significantly higher than the other three arguments, whose

ratings were not significantly different from each other. Therefore, B3 was not only the

most convincing argument in this problem, but was also the most explanatory one to the

126
participants. Again, it was suspected that the use of Pythagoras Theorem as evidence

contributed to its high ratings.

The ratings on all arguments in Problem C were not significantly different from

each other. This implies that the participants could extract information from each of the

arguments, which helped them understand better why the conjecture was valid. If true,

this could demonstrate the benefit and need of explaining mathematical results from

multiple aspects.

In Problem D, D4 (visual) received the highest rating, however the differences

between D4 and D1 (inductive) and D3 (perceptual) were not significant. D2 (algebraic)

received a rating that was significantly lower than any of the other three arguments.

Different from the cases in the other three problems, there was a single argument in

Problem D that received a significantly lower rating. We suspected that the way in which

the variable was used in D4 might be unfamiliar to the students. In classrooms, students

are usually asked to solve for the variable when it is given in the equation form. However

in D4, the presence of variable didn’t require solving for a value. Rather, the variable was

used to represent general cases and was eventually cancelled out in the calculation to

show the conjecture was always true regardless of its value.

Visual arguments appeared twice on top of the lists in the contexts of number

theory and geometry. These demonstrated the power of visual illustration in helping

students understand the problem better. Algebraic argument was considered most

explanatory in the geometry problem but was considered the least explanatory in the

algebraic problem. This demonstrated that students possessed the ability to understand

argument in abstract form. However they might still had difficulties if the form was
127
unfamiliar or too complex for them. Overall, there wasn’t any single type of argument

was considered more explanatory than others in all the problem contexts. In addition, as

demonstrated in Figure 17, the “explanatory” ratings were not significantly different in 14

of the total 24 pairwise comparisons. This suggested the need for and benefit on

promoting student understanding about the validity of a mathematical statement from

multiple aspects. A closer look at the data revealed that even the least explanatory

argument (D2) was considered explanatory by close to 60% of the participants who

claimed to understand the argument, which further supported the suggestion.

Arguments appealing to students

After evaluating all the arguments in a problem, the participants were asked to

choose one which they believed was closest to how they themselves would have argued

(e.g. see Question 11, Figure 8). A participant’s choice in answering this question was

considered as “appealing” to the participant. Unlike the previous three ratings where each

argument was evaluated separately and there could be multiple arguments in a problem to

be considered understandable, convincing, and/or explanatory, in choosing the appealing

argument, the participants needed to compare all arguments in a problem and then select

only one as the appealing option. Figure 18 illustrates the percentage of the participants

who found each argument appealing to them8.

8
Since the participants were allowed to choose none of the arguments, the percentages of the participants choosing
each argument in one problem did not add up to 100%.
128
Figure 18. The percentage of participants who considered each argument the appealing

The between group ANOVA was applied to determine whether the participants’

argument preferences were significantly different. Specifically, we decomposed the

participants’ choices of the appealing option into 4 columns, assigning the numerical

value “1” to an argument if it was indicated by the student as the appealing option and “0”

to the other three arguments (see Figure 19 for an illustration). Treating the 4 columns as

the 4 levels of within-subject variables, the within group ANOVA were applied to test the

129
between argument differences in each problem. The statistical results of the ANOVA tests

are included in Appendix A, Table 42.

Students Appealing A1 A2 A3 A4
Option
S1 A2 0 1 0 0
S2 A1 1 0 0 0
S3 A3 => 0 0 1 0
S4 A4 0 0 0 1
S5 A3 0 0 1 0
S6 A2 0 1 0 0
S7 None 0 0 0 0

Figure 19. An example of the data transformation for within group ANOVA test

Figure 20 illustrates the results presented in Table 42. In particular, the arguments

in each problem were listed sequentially from the most appealing (top) to the least

appealing (bottom). In addition, two arguments were connected using a curve if the

ratings they received were not significantly different (p > .05). In an intuitive sense, if

two arguments were connected, then they were “close” to (not significantly different from)

each other; if not connected, then the two arguments were separated from (significantly

different from) each other. For example, in Problem A, where A4 was not connected to

any other arguments while the other three arguments were connected to each other. This

indicated that the difference between A4 and any other argument was significant, while

the differences between A1 & A2, A1 & A3, and A2 & A3 were not significant.

130
Figure 20. Illustration of how appealing the arguments were to the participants

As illustrated in Figures 15 and 17, 38.9% of the participants selected A4 (visual)

as the appealing argument in Problem A, which was significantly more than those who

picked any other option. The percentages of participants who chose the other three

arguments (21.2%, 21.0% and 16.6%, respectively) were not significantly different from

each other. The preference towards A4 was consistent with findings in previous sections

which explored the most understandable, convincing, and explanatory arguments. It

suggested that the participants preferred to adopt manipulatives as visual aid to facilitate

their reasoning in the number theory problem.

In Problem B, the largest percentage (28.6%) of participants chose B2 (perceptual)

as the appealing argument. This number was significantly larger than those who had

chosen B1 (inductive, 20.0%) and B4 (visual, 21.4%), however it was not significantly
131
larger than those who selected B3 (algebraic, 27.3%). B2 used “football field” as a

context to demonstrate why the conjecture was true, where the explanation relied on an

illustration from real life experience; while B3 was based on the Pythagoras Theorem,

which was a well known results referenced in school curriculum. This rating was

interesting since the source of evidence used in the two arguments were different, yet

they were perceived as appealing by the same numbers of students. This again

demonstrated the diversity of students’ preferred ways of reasoning. The visual argument

might have been less appealing to the participants due to the complexity of its structure.

Compared to the simple image of a football field, the geometric figure used in B4

involves many more components (such as rectangle, circle, lines) and the relationship

among those components. It required more analytical thinking to be fully understand.

In Problem C, the arguments received close ratings. Among all the pairwise

comparisons only the difference between the most appealing option (C1, inductive,

preferred by 26.7% of the participants) and the least appealing option (C4, perceptual,

preferred by 19.7% of the participants) was statistically significant. Compared to the

participants’ responses in Problem B, where the perceptual argument was chosen as the

most appealing option, it was surprising to see C4 to have received the lowest rating in

Problem C. Two reasons might help explain this phenomenon. First, the scene created by

B2, i.e. the football field, might be more familiar to the participants than the scene

created by C4, i.e. using wires to make triangles. Second, the other options provided in

Problem C might be more appealing to the participants for various reasons. For example,

the visual illustration in Problem C, i.e. C3, requires less analytical thinking to

understand than B4, the visual argument in Problem B.


132
In Problem D, 30.7% of the participants selected D1 (inductive). This number was

significantly higher than those who chose D4 (visual, 22.5%) and the least appealing

option, D2 (algebraic, 18.3%). The same as in Problem C, the inductive argument was

again appealing to the largest percentage of participants. Compared to the inductive

arguments in Problems A and B, C1 (inductive) offered visual images of the samples and

D1 (inductive) offered a detail calculation procedure for one case. In A1 (inductive) and

B1 (inductive), such detail was not present. Therefore, we suspected that the extra details

provided by C1 and D1 contributed to their higher ratings.

The data revealed that no particular type of arguments was appealing in all 4

contexts. In fact, only 19 of the 476 participants considered the same types of arguments

to be appealing to them across the 4 problems. 122 participants chose one type of

arguments 3 times in the 4 problems. The rest of the participants (a number of 335) didn’t

pick any type of arguments more than 2 times. This result suggested that for a majority of

the participants, the appealing reasoning methods were highly context based and didn’t

uniformly lean on any particular type. However, it also suggested that some participants

might have developed more uniform preference towards certain types of arguments.

Investigating the rationale for choice by the participants whose judgment seemed to have

followed a uniform base was a focus of questioning during the interview.

Comparison across the ratings

In the previous sections the students’ responses were analyzed to determine what

argument was considered by most participants as understandable, convincing to show the

general validity of the conjecture, helpful to explain why the conjecture was true, and
133
closest to how they would argue in the same context. These arguments were referred to as

the most understandable, convincing, explanatory and appealing arguments as evaluated

by students, respectively. In this section, we offer an examination of whether students’

evaluation, using the four different ratings (i.e. understandable, convincing, explanatory

and appealing), was consistent in each problem, and to explain what might have

contributed to such consistency or inconsistency in choice.

Table 11 summarizes the participants’ choices of the most understandable,

convincing, explanatory and appealing arguments based on Figures 13, 14, 15 and 18. In

particular, the highest rated argument as well as those whose ratings were not

significantly lower than it were included in the proper cell of the table. In each cell,

arguments to the left received higher ratings.

Problem A Problem B Problem C Problem D


Understandable A1, A4 B2 C3, C4, C1 D1, D3
Convincing A4 B3 C2, C4 D4, D1, D3, D2
Explanatory A4 B3 C2, C1, C4, C3 D4, D1, D3
Appealing A4 B2, B3 C1, C2, C3 D1, D3

Table 11. Summary of the most understandable, convincing, explanatory and appealing
arguments as evaluated by the participants in each problem

In contrast, Table 12 summarizes the least understandable, convincing,

explanatory and appealing arguments. In particular, the lowest rated argument as well as

134
those whose ratings were not significantly higher than it were included in the appropriate

cell of Table 12. In each cell, arguments to the left received lower ratings.

Problem A Problem B Problem C Problem D


Understandable A2, A3 B4, B3 C2, C1 D2, D4
Convincing A1 B1, B2 C3, C1, C4 D2, D3, D4, D1
Explanatory A1, A3, A2 B1, B2 C3, C4, C1, C2 D2
Appealing A3, A2, A1 B1, B4 C4, C3 D2, D4

Table 12. Summary of the least understandable, convincing, explanatory and appealing
arguments as evaluated by the participants in each problem

Note that in Problem A, the participants’ choices in all 4 rating standards were

quite consistent. A4 (visual) was considered as the most convincing, explanatory and

appealing option. It was considered the second most understandable option, which was

not significantly lower (p > .05) than the most understandable option A1 (inductive). We

suspect that the visual image provided by A4 was close to the graphic illustration

provided in their early mathematics classroom that introduced multiplication and division.

Students’ familiarity with such a representation might have contributed to the higher

ratings. This suggested that visual representation could be reliable and helpful for

students when making judgment. It also suggested that the classroom experience had an

impact on students’ conviction. Aside from A4, the participants’ ratings on other

arguments were close (see Table 12) except that A1 was considered significantly less

convincing than all other arguments. This suggested that although A1 was the most

135
understandable option, the participants didn’t consider it more convincing and

explanatory than other arguments when showing that the conjecture was always true.

In Problem B, B2 (perceptual) was considered the most understandable and the

most appealing option, while B3 (algebraic) was considered the most convincing and

explanatory option. B3 was also considered insignificantly less appealing than B2. To

explain these inconsistent ratings, we conducted a closer inspection of the data. In

particular, we analyzed data from those who claimed to understand both B2 and B3. It

was found that in this subgroup, 33.9% found B3 more appealing and 29.7% found B2

more appealing. Therefore, B3 was considered the most convincing, explanatory and

appealing option among those who claimed to understood both B2 & B3. In fact, B2 was

considered significantly less convincing and explanatory than B3 and B4 (see Table 12).

Since there were significantly more participants who had found B2 understandable

compared to those who claimed to understand B3, and it was those who had found B2

understandable raised the overall appealing rating of B2. The result was sensible since we

assumed almost every student knew what a football field looked like. Additionally, B2

used easier language while building a perceptual connection between the scenario and the

conjecture, which requires less analysis. On the other hand, B3 was convincing,

explanatory and appealing to those who understood it because the Pythagoras Theorem is

one of the most well known and reputable results in school geometry. Therefore, if a

student understood B3, they would most likely give it a high rating. In addition, B1

(inductive) was considered the least convincing, explanatory and appealing option, which

again demonstrated that inductive arguments without further explanation were not

preferred by the participants.


136
Problem C represented a context that was the least familiar to the students. As the

contrasting item, the conjecture was not true and all the arguments presented in that

problem were false. In this context, only 2 participants indicated that none of the 4

arguments could show the conjecture was always true or could help them see why the

conjecture was true. After a closer examination of these 2 participants’ responses, we

found one of them had indicated that he didn’t understand any of the 4 arguments; while

the other suggested she didn’t think any of the arguments was close to how she would

reason. Furthermore, she claimed that “the tringle 9 1 is clearly bigger than tringle 2 so if

you put bigger number in tringle 1 then it will be bigger than 2.” With the exception of

these 2 cases, all other participants selected “agree” for at least one statement suggesting

that one argument in Problem C showed them or helped them see the conjecture was true.

Although the participants might not have been sure that the conjecture was always true

even when they chose the “agree” option, the data suggested that no student clearly

pointed out the conjecture was false, hence there was no clear evidence to show that

when working on Problem C, any of the participants had assumed the conjecture false.

Tables 23 and 24 indicated the participants’ ratings on the arguments in Problem C were

close. Not a single argument was received a rating that was significantly higher than the

others in any criteria. Indeed significantly more students found C3 (visual) and C4

(perceptual) more understandable than C2. This was not surprising since C3 and C4 used

easier language and there was explanation involving intensive usage of abstract symbols.

C2 and C4 were considered by most students as convincing. It was surprising to see that

9
The participant misspelled “triangle” as “tringle” in her response to the survey.

137
C2 was considered convincing by more students than C3. A possible explanation is that

the triangle area formula included in Argument C2 was repeatedly addressed in

mathematics classroom and its appearance might add credibility to the argument. All the

arguments were considered as almost equally explanatory by the participants, suggesting

that when encountering an unfamiliar context, explanation from various aspects could

help students better understand the problem. When choosing the appealing arguments,

Arguments C1, C2 and C3 received approximately an equal number of votes while

Argument C4 fell behind. This was also surprising since Argument C4 was rated as the

most understandable and second most convincing argument. A tentative explanation

could be that the scenario used in Argument C4 was not closely associated with the

mathematical content presented in the conjecture, so the students might not have seen a

natural connection there. Therefore fewer students thought they would adopt such a

strategy when encountering the problem.

In Problem D, D1 (inductive) was considered as the most understandable and

appealing option and was rated a close second to D4 in the other two criteria. This result

was particularly interesting compared to how inductive arguments were viewed in

Problems A & B. As mentioned in the previous analysis, we suspected that the extra

details offered in D1, i.e. the layout of the calculation procedure, contributed to its higher

ratings. Another interesting finding in Problem D was that D2 (algebraic), which shared

the same procedure as what was used in D1 was considered as the least understandable,

convincing, explanatory and appealing option. Our conjecture is that the symbolic

representation appeared to be more complex than the numerical equations. Therefore,

138
students tended to choose the easier one when they saw both options without considering

the difference in logic that they might have offered.

Collectively we found that among the 32 cells in Tables 23 and 24, only 8

contained one single argument, while 10 contained 3 or 4 arguments. This suggested that

students rarely gave significantly higher or lower rating to any particular argument, and

in many cases, the between argument differences was small. Furthermore, as shown in

Figure 20, even the least appealing argument (i.e. A3), was chosen by about 1/6 of all

participants. While the difference between A3 and other arguments’ ratings were

statistically significant, it didn’t mean that A3 was of no value to the students. In fact, it

was considered understandable by 76.7%, convincing by 55.5%, and explanatory by 67.7%

of the participants. Therefore, the use of A3 definitely could provide extra opportunities

for students to approach the conjecture in Problem A. The case of A3 illustrated that

although some arguments received significantly lower ratings than others (no matter in

what criteria), they were still chosen by some students and their preference shouldn’t be

ignored.

In order to distinguish if there was one type of argument that received higher

ratings than all others, we counted the number of times each type of arguments appeared

in Table 11 and Table 12 (see Table 13). As shown in Table 13, there were almost an

equal number of each type in both columns. The result indicated that there wasn’t any

particular type of argument that received higher/lower ratings than other ones. This

finding again illustrates that, in general, “type” couldn’t determine students’ evaluation

and there might be other factors influencing their assessment.

139
Argument Type # of Appearances in Table 11 # of Appearances in Table 12
(high ratings) (low ratings)
Inductive 8 10
Perceptual 9 9
Visual 9 8
Algebraic 7 10

Table 13. Summary of high and low rated arguments by type

Lastly, it was detected that the ratings of the same type of arguments were highly

inconsistent across the problems. For example, the inductive argument was rated high in

Problem D while received low ratings in Problem B; the algebraic argument was rated

high in Problem B but received low ratings in Problem D; the visual argument was rated

high in Problem A but received low ratings in Problem B, and etc. These results again

demonstrated the complexity of students’ evaluation of mathematical arguments. In

previous analysis, we identified a few factors that we suspected had influenced students’

choices, such as the amount of details provided, the fluidity of language, and the

familiarity of scenario. The exploration and synthesis of these factors were the major goal

of the follow up interviews.

Comparison between subgroups of students

As demonstrated in above discussion, the type of arguments that were rated as

most understandable, convincing, explanatory and appealing was distinct across the

contexts. While we were not able to make a grand conclusion about what type of

140
arguments were understandable, convincing, explanatory and appealing to students, the

high ratings that some arguments received made sense in their respective contexts.

Therefore, we suspected that there were factors other than the presentation and content of

the problems that had impacted the participants’ choices. While we were not able to

obtain an explanation for the choices merely based on the survey results, we probed for

factors by analyzing responses according to different subgroups of students. We assumed

some of these factors were rooted in both school-based mathematical experiences of

children as well as non-mathematical experiences gained from life outside the school

environment.

To investigate the influence of school mathematical experiences on the

participants’ responses, we compared data from participants who were enrolled in higher

performing schools to those enrolled in lower performing schools. The percentages of

mathematical proficiency of the two higher performing schools, as measured by the 2012

state standardized 7th grade mathematics tests, were 10% above state average; while the

percentages of the 8th grade mathematical proficiency of the two lower performing

schools were at least 10% below state average. Therefore, the difference in the students’

levels of mathematical proficiency between the higher and lower performing schools, as

measured by the standardized tests, was rather large. While this comparison couldn’t rule

out the influence exerted from non-mathematical experiences on students’ choice, it

would be valuable to see if students who achieved higher scores on state standardized

mathematics tests would demonstrate more maturity in mathematical reasoning ability as

measured by SMR.

141
The second comparison considered the potential impact of the participants’ gender

on their choices. The male and female students were enrolled in the same schools and

same classrooms, taught by the same teachers using the same teaching materials and

techniques. Although sitting in the same classroom didn’t mean the same classroom

experience for each individual learner, however if the cumulative data suggested a large

difference between female and male students’ responses, it was unlikely that this

difference was caused by instruction. Therefore, it was assumed that different responses

from male and female participants could provide additional insight on learner’s choices

and reasoning. Details of the two comparisons were shared in the following discussion.

Between school comparison

117 of the participants were enrolled in higher performing schools and 311 of the

participants attended lower performing schools. For convenience, participants from the

higher and lower performing schools were referred as Group H and Group L, respectively.

We adopted between group ANOVA to test the between group difference of the

participants’ responses to each question in SMR. Using the same data quantifying strategy,

“1” “0” and “-1” was assigned for “agree” “not sure” and “disagree” respectively.

Questions were also labeled in a format like “A1.2”, where A1 indicates the argument,

and 2 indicates the 2nd question under this argument, which assesses whether A1 is

convincing. In addition, four variables (e.g. A5.1 – A5.4) were used to quantify students’

choices of the most appealing argument in each problem. The quantifying strategy was

illustrated in Figure 19. Table 43 (in Appendix B) illustrate the results of the between

142
group comparisons of students’ evaluation of each argument by different ratings (i.e.

understandable, convincing, explanatory, and appealing).

As reflected in Table 43, the between school differences were not significant for

all 64 variables (questions) except for C5.3, D1.2, D4.2, and D5.4. Specifically,

significantly more participants from Group H considered C3 (visual) and D4 (visual)

appealing. In addition, significantly more participants from Group H considered D4

(visual) convincing, while significantly less participants from Group H considered D1

(inductive) convincing.

Compared to the between group differences in standardized test performance

(Group H as least 10% above state average and Group L at least 10% below state average,

as measured by percentage of proficiency), the differences detected within their

performance in SMR were much smaller. In particular, the two groups’ evaluations were

not significantly different on 60 of the 64 variables. This result suggested that a higher

performance on the standardized test didn’t represent higher maturity in mathematical

reasoning.

A closer examination of the 4 cases where the group differences were significant

revealed that participants from Group H tended to prefer the visual arguments in both

Problems C and D (i.e. C3 and D4). 32% in Group H selected C3 as the appealing option

while only 22% in Group L did so. 30% in Group H selected D4 as the appealing option

while only 21% in Group L did so. It made sense that Group H exhibited a higher

preference towards D4 since the understanding of D4 requires knowledge of coordinate

plane and such knowledge could contribute to higher standardized test scores as well.

Additionally, compared to Group L, Group H also considered D4 more convincing. This


143
result might indicate Group H’s better comprehension of D4, which could have

contributed to their higher preference for this argument. However Group H’s higher

preference towards C3 was less sensible. A possible explanation might be that the

participants in Group H had developed the skill to utilize transformational thinking in

certain geometry contexts and they were more capable of visualizing the change of

geometric shapes by reading the description and static images that depict stages of the

transformation. However, this explanation didn’t align with the fact that C3 was not

considered more understandable, convincing, or explanatory by Group H. In addition, it

was detected that Group H considered D1 (inductive) less convincing that Group L did.

Our hypothesis for this result was that there could be more students from Group H that

had realized that D1, although showed more details than the inductive arguments in other

problems, still couldn’t show the conjecture was always true.

As discussed above, there existed cases to illustrate some differences between

Group H and Group L; however, in general the two groups’ responses to the SMR were

compatible. Therefore, the survey data suggested that classroom experience that

enhanced higher proficiency in state standardized tests didn’t promote students’

mathematical reasoning capacity. In addition to the mathematics classroom experience,

we suspected there were other factors that might have impacted the participants’

responses. Hence we conducted a between gender comparison to probe more

explanations to the survey results.

144
Between gender comparison

Among the 476 participants of SMR, 229 were male and 237 were female. The

remaining 20 participants chose not to disclose their gender and were not included in this

comparison. The male and female students were enrolled in the same classrooms in the

same schools. They also had lived in the same communities. Therefore, we didn’t assume

large differences in the mathematical experience they obtained in school or at home.

Consequently, we suspected that the between gender comparison might reveal some non-

mathematical factors that could have influence their evaluation of mathematical

arguments. The same method (i.e. the between group ANOVA) was adopted to assess the

gender differences. Table 44 (in Appendix B) illustrate the statistical results of the

comparison.

As reflected in Table 44, the gender was an insignificant variable in all 64 cases

except for A2.1 and A5.2. That is, the female students considered Argument A2 (algebraic)

significantly less understandable and appealing than the male students did. This result

was surprising since A2 was stated using pure mathematical language and didn’t refer to

any life experience. Therefore it was difficult to perceive how the gender might have had

an impact on the evaluation of this argument. A possible explanation is that more male

students were comfortable using algebraic method to work on the number theory problem,

however bases for this hypothesis was quite weak.

Analysis revealed that gender difference was small. Therefore, the gender

difference test didn’t offer us insights into factors that impact students’ evaluation. Such

an inquiry was left to be accomplished during the interview analysis of the study.
145
Gender * School effect

Lastly, we studied the gender * school effect on the ratings provided by the

students to investigate if gender impact on the ratings were significantly different when

comparing the higher and lower performance schools. The results are included in

Appendix B, Table 45.

As shown in Table 45, the gender * school effect was significant (p < .05) for

A2.1, D2.1, D4.1, and D4.2. Note that A2.1, D2.1 and D4.1 measured whether A2

(algebraic), D2 (algebraic) and D4 (visual) were understandable to the participants. D4.2

measured if D4 (visual) could prove the conjecture in Problem D was always true. To

further investigated the gender * school effect on these four variables, we generated plots

using school as separate lines and gender as horizontal axis (see Figure 21).

146
Figure 21. Plots for variables on which the gender * school effect was significant

Figure 21 demonstrated that the gender differences of all variables were small in

the lower performing schools. However in the higher performing school, the differences

were large. In particular, the male students provided significantly higher ratings than

female students for all four variables. That is, the male students in the higher performing

schools found A1 (algebraic), D1 (algebraic), and D4 (visual) significantly more

147
understandable than the females students in the same schools did. In addition, the male

students in those schools also considered D4 (visual) significantly more convincing than

their female classmates did. This result suggested that male students from the higher

performing schools seemed to be more likely to understand algebraic arguments in non-

geometric contexts than their female counter parts. In addition, they were also more

likely to understand and be convinced by graphs in a coordinate plane. Since the gender

differences about the same questions were small in the lower performing school, we

suspected that the enlarged gap in higher performing school was caused by knowledge

perceived from classroom instruction. However, it was unclear why the male students

seemed to perceive the algebraic representation better.

Nevertheless, the significant gender * school effect was only found in 4 of the 64

tested variables. Therefore, the cross effect was not significant for the participants’

responses to most of the questions used in SMR.

Summary of findings from SMR

The survey data demonstrated great diversity among the participants’ evaluation

of the arguments used in SMR. Among all 16 arguments used in the 4 problems, even the

least understandable argument rated by the participants (D2, algebraic) was indicated as

understandable by nearly 60% of them; the least convincing argument (B1) was indicated

as being able to show the corresponding conjecture was true by about half of those who

understood the argument; the least explanatory argument (D2, algebraic) was considered

as being helpful to show why the conjecture was true by close to 60% of the participants

148
who understood the argument; and the least appealing argument (A3, perceptual) was

selected as the closest way to how they would argue by about 1/6 of the participants.

Although many arguments used in SMR were incorrect or incomplete using a

higher standard of mathematical rigor, they may be most compatible with ways in which

many students themselves argue. Data from the participants’ responses to SMR offered

much insight to these “natural” ways. To sum up, the study results indicated that:

 The participants’ evaluation of the same argument was highly diverse among

individuals.

 The participants were more likely to understand an argument when it showed

more details about concrete examples or provided visual support.

 The participants were unlikely to be completely convinced by checking and

verifying a few cases. Further support from multiple sources, such as visual

illustration, past experience, theorem and formula contributed to their

conviction.

 All argument was considered explanatory by the participants, suggesting the

need of multiple approaches to promote student understanding.

 The appealing reasoning mode varies in different mathematical problems; and

the group’s favorite argument types also varies across the contexts.

 No particular type of argument was consistently and significantly more

appealing to students than others across the contexts.

 Between school and gender comparison revealed insignificant differences

between high and lower performing schools and between male and female

students in their responses to most questions.


149
The survey results only enabled us to make conjectures about the factors that

might have contributed to students’ evaluation based on our understanding of the content

involved in the arguments. We suspected that concrete examples and visual illustrations

contributed to students’ conviction if they were understandable. We speculated that

perhaps examining one case in detail may have helped students see why the conjecture

was true. We conjectured that arguments that used easier language and offered shorter

description were more likely to be preferred by students. However, these conjectures

couldn’t be verified merely based on the survey data. The follow-up interviews aimed to

unpack students’ perception of the arguments and their rationale for decisions they had

made. The interview also allowed us to explore the mathematical and non-mathematical

factors that had impacted the students’ judgment. Results from the interviews are

presented in the section below.

Findings from the Interviews

The survey results suggested that the students’ preference of arguments were

highly diverse across the problems and between individuals. The results however didn’t

allow us to infer what types of arguments were more appealing to students. Furthermore,

the data didn’t capture specific features of the arguments that had significantly impacted

students’ evaluation of the arguments. Since students’ judgments were made based upon

their understanding of each argument, we believed there were hidden factors that could

have impacted their choice. In order to further investigate those factors, we relied on

follow up interviews in an attempt to understand the rationale behind students’ judgment

as indicated in the survey results.


150
Eight students participated in the interviews. The subjects’ background

information as well as the selection process were included in Chapter III. Each interview

lasted about an hour. Details regarding the interview procedure were also described in

Chapter III. This section offers analysis of the interview data. In particular, we first

provide a description of what happened during each interview as we investigated each

subject’s personal scheme when evaluating mathematical arguments. We then offer an

analysis of the participants’ responses to each problem so to examine the potential impact

of the context on students’ evaluation.

Report of interview data by individual

Survey results were insufficient to explain why the respondents had made certain

decisions. It was not assumed that an individual relied on the same factors and used the

same logic in every context; however, by comparing and analyzing his/her responses in

multiple problems, we assumed that we were more likely to detect factors that

consistently impacted his/her evaluation of mathematical arguments. In doing so, we

examined the subjects’ responses, including how they ranked the arguments from the

most convincing to the least convincing, along with their justification for the ranking.

The analysis of Allen’s interview responses has been elaborated in the methodology

chapter and served as an illustration of the analysis process. Below we included findings

from the other seven subjects which was obtained using the same analyzing techniques

demonstrated in Allen’s case.

151
The case of Abby

Abby was an 8th grade student enrolled in an Algebra I class at the time of data

collection. In her responses to SMR, the inductive arguments were indicated to be the

closest to how she would argue in all but Problem C, where she selected C2, the algebraic

argument. Based on this result, we believed that Abby had exhibited preference towards

inductive arguments. Therefore, she was considered to be a representative from the

consistent group.

Abby’s interview responses are summarized in Table 14 and Table 15. Table 14

illustrates the rankings provided by her for each problem. Column One of the table

represents the order of problems that she tackled. Table 15 summarizes Abby’s comments

when articulating why she found certain arguments convincing or not convincing (The

coding of each comment is explained in Table 9). These two tables served as the major

resource for the interview analysis.

Most convincing -------------------------> Least convincing


Problem D D1 (inductive) D3 (perceptual) D2 (algebraic) D4 (visual)
Problem C C1 (inductive) C2 (algebraic) C4 (perceptual) C3 (visual)
Problem B B2 (perceptual) B1 (inductive) B4 (visual) B3 (algebraic)
Problem A A4 (visual) A2 (algebraic) A1 (inductive) A3 (perceptual)
Problem E E1 (inductive) E4 (algebraic) E3 (perceptual) E2 (visual)

Table 14. Rankings of arguments provided by Abby

152
Positive Comments Negative Comments
Problem D
That’s how I would normally do it, it It doesn’t show, like, how it is after the
shows like how to get there. (E2) tax. (E2)
I think it’d be easier to do this way than, There’s just so much work… if you can
like, have a graph. (E2, R3, R1-) make it simple, like this one [points to
D1], why would you confuse yourself?
(E2, P)
That shows 20 times 5 equals 1, so then I would need to try it. (E2)
that’d just prove that he’s right. (E2)
I think work would be easier than trying to
make a graph. (R1-, R3)
It also says they tried it with 200 and 500,
which gives more information. (E2)
If it works for 200 and 500, why wouldn’t
it work for 300? (L3)
I just think ’cuz you’re multiplying it by
the same thing, and if it works for those
two, if you tried 300, I think it’d work.
(L3)
Problem C
It says the formula is base times height, I’ve never heard of using wire to make a
divided by 2, and I think since they’re all triangle. (E3)
greater, then that does prove that this
height would be bigger than this one, so
that’d prove it’s bigger. (E4, L2)
It shows you how he got, how they’re I’ve never heard of this either to do, to
bigger areas. (E2, R1) figure that out. (E3)
It puts it in a form how you can see 1 is These I’ve never actually done. (E3)
bigger than 2, and they did, like,
equilaterals, scalene, isosceles triangles, so
they did all the different triangles, and then
they showed. (E2, R1, L3)
That one would be easier. (P)
That’s just how I’ve been taught since I
was little. (E1, E3)
continued

Table 15. Summary of comments made by Abby

153
Table 15 continued
Positive Comments Negative Comments
Problem B
Everyone knows what a football field This one makes no sense at all. (NA)
looks like, so you can just, like, imagine in
your head that the diagonal’s longer than
all sides. (E3, L2)
Anyone can draw rectangles and measure I’ve just never done it like that. (E3)
the sides, and they could obviously see the
diagonal’s longer. (R1, E2)
I’ve seen a football field before, and I
know how big they are. (E3)
It’d be simpler to do this than figure out,
make sure you did the circle right. (P)
Problem A
I know how to make, like, algebraic I haven’t tried, like, large numbers, say
expressions, so if I would put it this way, like, a thousand something, that a multiple
I’d understand it more, and it also proves of 6, I didn’t know if that’d be a multiple
that 6 equals 3 times 2. (E4, R4) of 3 too. (E2)
It shows that it could be for any number It proves that it could do that, but they
that’s a multiple of 6. (R4, L6) didn’t show how they did it. (E2)
This one is easy to see, visualize it. (E2, It’s confusing because they show many
R1, P) words in it. I just don’t like word
problems. (R2-)
I’ve been doing that this entire year You could see it, how they put it, but if
because of algebra. (E1, E3) you were just told to figure that out, and
you didn’t have these in front of you, it’d
be hard to tell. (E3, P)
I was taught to put those in algebraic If they said, like, use the square cards, and
expressions. (E1, E3) you didn’t have them in front of you,
you’d have to think and put them together
and draw them. (E3, P)
They, like, show you how to do it, and
they show you it’s true. (P)
Comparing Problems A-D
In this problem, they like, show you They don’t show you how you get them.
pictures, they like show you the triangles, (E2)
and what their size is. (R1, E2)
The size, and I can put them together, in a We were taught to do a tree, and branch
way… like it makes sense to how it’s off the multiples, so… I would need the
smaller. (E2, R1) tree in front of me to see. (E2, E3)
continued

154
Table 15 continued
Positive Comments Negative Comments
If I was by myself, and I didn’t have I wouldn’t know how to make the graph
someone to explain that, that would be a just off the top of my head for that certain
better pictures. (R1) problem. (E3-, P)
I know how to do this, but if I didn’t, that This one confuses me. (NA)
[points at C1] would be easier to find. (E2,
P)
They show you the problem within… the It doesn’t show enough, like it doesn’t
words… so they gave you an idea within give you enough numbers. (E2)
those. (R2)
Just how they worded it. (R2-)
I can imagine a cookie box, it’s just the
words… because it’s just so much. (R2-)
Problem E
It shows you how they got there, like, it This isn’t as easy to visualize. (R1, P)
shows you that it won’t change. (E2)
Just showing the picture, it can help you You’d have to think more, and work it out
visualize in your mind without having to more. (P)
do a lot of work, you just know it won’t
change. (E2, R1, L2)
It shows you, like, the percentage, and it There’s not really any work to show how
won’t change. (E2, R3) they got there, so if you didn’t know, like,
the problem, you wouldn’t be able to
figure this out. (E2, P)
They show the percentage, and when you You wouldn’t know how they got there,
double it, it still stays the same, so why because they didn’t show any work. (E6-)
would it be any different if you did 5 and
3? (E2, R3, L4)
It’s kind of hard to understand. (P)
It had to be explained to me. (P)
It’s a confusing picture. (P)
Additional Comments
It put the work in it. (E2)
It put the work within the problem. (E2)
When they did the 2 and the 3, they
showed the percentage. (E2, R3)

155
As shown in Table 14, Abby considered the inductive arguments most convincing

in 3 of the 5 problems (i.e. Problems C, D, and E), while the visual arguments were rated

least convincing in the same three problems. The algebraic and perceptual arguments

were placed between the visual and inductive arguments. This general preference toward

inductive arguments was consistent with his responses in the SMR. However, Abby’s

rankings for the arguments in the other two problems were different. In Problem A, she

considered the visual argument as the most convincing while the perceptual argument the

least convincing. In Problem B, the perceptual argument was ranked the most convincing

while the algebraic argument was considered the least convincing.

In order to better understand how Abby evaluated the proposed arguments and her

rationale when providing these rankings, the coding for her explanations in Table 15 were

summarized in Table 16 so to identify factors and features of the arguments that had

influenced her judgment.

As shown in Table 16, the total number of comments that referred to the

representation, evidence and link of the arguments were 22, 46, and 8, respectively,

indicating that the evidence had the largest impact on Abby’s judgment. Among all types

of evidence, Abby found that examples (i.e. results from an immediate test) the most

reliable source to establish an argument. It was referenced 28 times throughout the

interview. Abby’s reliance on specific examples could be highlighted by her claim that “if

it works for 200 and 500, why wouldn’t it work for 300?” Furthermore, imaginary (i.e.

scenarios recalled from or created upon previous experience) was also considered reliable

evidence to her (referenced for 12 times). For example, she suggested that “I’ve seen a

football field before, and I know how big they are” and hence considered B2 convincing.
156
In addition, she found some arguments not convincing since she had “never done it like

that,” emphasizing the importance of personal experience to her conviction.

Total number of references to representation: 22


Visual Narrative Numerical Symbolic
Positive 9 1 5 2
Negative 2 3 0 0

Total number of references to evidence: 46


Authority Example Imaginary Fact Assumption Opinion
Positive 3 28 12 1 0 0
Negative 0 0 1 0 0 1

Total number of references to link: 8


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 3 3 1 0 1
Negative 0 0 0 0 0 0

Table 16. Categories of comments made by Abby

The representation of arguments also influenced Abby’s judgment. However,

there didn’t seem to be a certain type of representation that particularly contributed to her

conviction. In fact, the same type of representation could affect her judgment negatively

and positively, depending on the context. For example, in Problem D, she found other

methods “easier than trying to make a graph,” while in Problem A she found A4

convincing since it was “easy to see, visualize it.” In addition, she found A3 “confusing

157
because they show many words in it” and she didn’t “like word problems.” However

when commenting on D1 she claimed it was convincing since “they show you the

problem within the words so they gave an idea within those.”

Abby’s comments revealed that she could be convinced by perceptual connection,

induc tion, transformation, and deduction, each of which was detected to have been

referenced for 3, 3, 1, and 1 times, respectively. She did realize that an argument should

be valid for all cases when working on Problem A. However, this realization wasn’t

evident in her comments when she worked on other problems. Therefore, she didn’t seem

to be consistently concerned with the logic of arguments.

14 of Abby’s comments were coded as “P.” A closer examination of those

arguments revealed that Abby tended to consider easier argument more convincing. This

point was repeated in 11 of the 14 comments. Therefore, the simplicity of an argument

indeed had impacted her conviction. This was demonstrated by her claim that “… if you

can make it simple, like this one, why would you confuse yourself?” In addition, it was

found that the need to do extra work made an argument less convincing to her. For

example, when evaluating A4, she claimed that the need to “think and put them

[manipulatives] together and draw them” complicated the process and made the argument

less convincing to her. Another example was her comment about D4. Although she

claimed that she understood the graph, she considered it less convincing because she was

not able to “make the graph just off the top of my head.” The pursuit of simplicity

explained her preference toward the use of easy examples and imaginaries created upon

previous experience as evidence, since examining a few examples and referring to

previous experience might be the easiest way for her to access the problem.
158
Based on Abby’s comments when explaining her rankings, Figure 22 was

generated to highlight her rationale for evaluating mathematical arguments.

Perceptual
Inductive Examples,
Convincing Imaginaries
arguments
Easy to understand (Visual, Numerical)
Familiar procedure

Figure 22. Illustration of Abby’s rationale for evaluating mathematical arguments

Adopting Figure 22 as a guide to understand the rankings Abby provided in Table

14, we believe that Arguments A4 (visual), B2 (perceptual), C1 (inductive), D1

(inductive), and E1 (inductive) provided the most accessible examples and scenarios and

hence were considered the most convincing. In particular, the manipulative model used in

A4 and the football field scenario in B2 were both familiar contexts to her. The examples

provided in C1, D1 and E1 were easier to understand than those used by other arguments.

On the other hand, Arguments A3 (perceptual), B3 (algebraic), C3 (visual), D4 (visual),

and E2 (visual) might be more difficult to access. A3 was too “wordy;” the graph in D4

was difficult to create; the diagram in E2 was “hard to understand;” and she had never

“actually done” anything like what was described in B3 and C3. Therefore, the difficult

access to understand these arguments made them less convincing to her.

159
The case of Alice

Alice was enrolled in an Integrated 8th Grade Mathematics class at the time of

data collection. In her responses to SMR, the perceptual arguments were indicated to be

closest to how she would argue in all but Problem A, where she chose A4, the visual

argument. Based on this result, we believed that Alice had exhibited preference towards

perceptual arguments. Therefore, she was considered to be a representative from the

consistent group.

Alice’s interview responses are summarized in Table 17 and Table 18. Table 17

illustrates the rankings provided by her for each problem. Column One of the table

represents the order of problems that she tackled. Table 18 summarizes Alice’s comments

when articulating why she found certain arguments convincing or not convincing (The

coding of each comment is explained in Table 9). These two tables served as the major

resource for the interview analysis.

Most convincing -------------------------> Least convincing


Problem D D3 (perceptual) D4 (visual) D1 (inductive) D2 (algebraic)
Problem B B4 (visual) B1 (inductive) B3 (algebraic) B2 (perceptual)
Problem A A2 (algebraic) A1 (inductive) A4 (visual) A3 (perceptual)
Problem C C3 (visual) C2 (algebraic) C1 (inductive) C4 (perceptual)
Problem E E1 (inductive) E3 (perceptual) E2 (visual) E4 (algebraic)

Table 17. Rankings of arguments provided by Alice

160
Positive Comments Negative Comments
Problem D
It shows the picture, which helps the I didn’t understand it; like, I tried and
reader understand more. (E2, R1) tried, but I just couldn’t figure out how to
do it. (NA)
It shows how much it is before tax and It doesn’t seem like it made as much sense
after tax, which helps you notice, or as the first two did. (NA)
realize, how stuff is. (E2, R1)
The height of the line before tax is, er,
after tax is higher up than the line before
tax. (R1, E4)
Problem B
It shows that, like, when you put it in a When I stand on the edge of the football
circle, BD would always follow along the, field, I look at the diagonal and then I look
well it won’t always follow along the edge straight, it looks the same, because like,
of the circle, but if you just imagine that it you can’t see it from like, up in the air,
would, then it’d be longer than BA or BC. you’re on the ground looking at it, so you
(E2, R1, L4) can’t really tell the distance, and it looks
the same. (E3, L2)
Usually, like, when you draw rectangles, When you do AB squared plus AD
and you measure the length of their sides, squared, um, like, it won’t always come
and then you draw a diagonal, the diagonal out to be BD squared, because when you
is always gonna be longer than the edges, combine AB squared and AD squared, it’ll
because in order to get, like, from the edge actually turn out to be farther than BD
and then down, like in order, like… when squared. (NA)
you draw the rectangle, like, the size of the
diagonal will always be longer than this
because, like, if you had a circle, and you
were to bring it up, it would come up to
like, right here, because the diagonal is
always longer than the straight line,
depending on how long the straight line is,
and when it's inside a rectangle, then the
diagonal will always be longer. (E2, R1,
L4)
Now that I think about it, the longer it [the It [Pythagorean theorem] doesn’t really
side] is, the longer the diagonal will be, so apply to this problem. (NA)
that it can go from corner to corner. (E3,
L4)
continued

Table 18. Summary of comments made by Alice

161
Table 18 continued
Positive Comments Negative Comments
I honestly understand better, stuff better
when it has like, a picture, ’cuz I think
better when I can see it and not read it.
(E2, R1)
Problem A
I used the multiples of 6, like she used, and I didn’t really understand this one, because
for every one I tried, they’re multiples of 3 when they split ’em up, it showed that they
as well. (E2, L3) were multiples of 3, but… I don’t know…
it’s confusing. (NA)
When I work it out, like… when I choose I didn’t understand the wording that they
a random number for n, I put it in and like, put it in, and it made it really confusing.
for instance, here I chose 6 for n, and 6 (NA)
times 2 is 12, and I multiplied that by 3, it
equals 36, and on the other side, I plugged
6 in, and 6 times 6 is 36, so it’s right. (E2,
L3)
When you plug in a number for n, There might be some number of 6 that
whatever you do in the equation will be isn’t always [contained in the discussion of
the same on the other side, like, the A1]. (E2)
answers are. (E2, R4, L4)
Problem C
It’s more appealing to me because, like I
understand what it’s saying, but if you take
the sides of Triangle 1 and you cut them
down, and then you make it into like,
sometimes a similar triangle, then Triangle
1, then, like it can be the same shape but it
won't be the same size, because you cut the
sides down. (E2, R1, L4)
Comparing Problems A-D
You can plug in any number, and you’ll You have to have a specific number in
always, like, get the same number on both order to get the answer that they’re looking
sides. (E2, R4, L6) for . (E2, R3)
I found that one more understanding It has more than one step, which makes it
because it gave you, like, the numbers that kind of harder to do. (P)
they tried to where you could try multiple
numbers. (E2)
They gave you more choices to… choose It’s confusing, like, the way they split it
from to where it’s like, not so complicated, up… (NA)
like you have more numbers to work with.
(E2, P)
continued
162
Table 18 continued
Positive Comments Negative Comments
It gives you something that you can like, It was harder for me to comprehend. (P)
draw with to where like, you can even see
for yourself. (E2)
They only gave you two, but like, what if
the price is higher than 500. (E2, L3-)
They only gave you two numbers to work
with. (E2, L3-)
You can get many, like, possible ways.
(E2)
Problem E
I think of the problem in a different way.
(P)
I’m not comprehending what they’re all
saying… because I have a different way of
finding the answer than all of the
arguments. (NA)

The rankings provided by Alice were surprising to us since 3 perceptual

arguments were rated most appealing to her in the SMR; however, most of them were

placed at the bottom of the list during the interview (see Table 17). In addition, her

evaluation of the same type of arguments were inconsistent across the problems. For

example, algebraic argument was considered the most convincing in Problem A, second

most convincing in Problem C, third convincing in Problem B, and least convincing in

Problems D and E. Therefore, the ranking provided by Alice hardly revealed any pattern

in her judgment. In order to better understand how Alice evaluated the proposed

arguments and her rationale when providing these rankings, the coding for her

explanations in Table 18 were summarized in Table 19 so to identify factors and features

of the arguments that had influenced her judgment.

163
Total number of references to representation: 10
Visual Narrative Numerical Symbolic
Positive 7 0 1 2
Negative 0 0 0 0

Total number of references to evidence: 21


Authority Example Imaginary Fact Assumption Opinion
Positive 0 18 2 1 0 0
Negative 0 0 0 0 0 0

Total number of references to link: 11


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 1 2 5 0 1
Negative 0 0 2 0 0 0

Table 19. Categories of comments made by Alice

As shown in Table 19, the total number of Alice’s comments that focused on the

representation, evidence and link of the arguments were 10, 21, and 11, respectively. It

was found that most of Alice’s comments were about the evidence of the arguments.

Among all types of evidence, Alice found that examples (i.e. results from an immediate

test) the most reliable source to establish an argument. It was referred 18 times

throughout the interview. Additionally, in 3 cases she also considered the arguments that

used imaginary scenarios and mathematical facts convincing.

She found induction reliable in some cases (e.g. she suggested that “I plugged 6 in,

and 6 times 6 is 36, so it’s right”); however, it was detected that in some other situation

164
she demonstrated a need to see more than just checking a few cases. This was

exemplified by her comment on D1 that “they only gave you two, but like, what if the

price is higher than 500?” A closer examination revealed that whether an argument

involved a “transformational” link between the evidence to a broader scope had an

impact on Alice’s judgment. This was detected 5 times, including the comments that “BD

would always follow along the, well it won’t always follow along the edge of the circle,

but if you just imagine that it would, then it’d be longer than BA or BC” and “Now that I

think about it, the longer it [the side] is, the longer the diagonal will be, so that it can go

from corner to corner.” These transformations were made in visual contexts.

The ability to visualize transformation helped us understand the rankings provided

by Alice in Table 17. Notice that she considered the visual arguments (B4 and C3) as the

most convincing in Problems B and C, both of which utilized transformational reasoning.

D4 was also considered convincing in Problem D, where she claimed to be able to see the

constant distance between the parallel lines, which convinced her that the difference

remained to be the same value. In Problem A, the A4 (visual) was considered not

convincing. Her explanation, however, revealed that she could clearly see how one

diagram transformed to another but was not able to see how it connected to the problem

context. Hence what made the argument less convincing was not due to the use of

transformation.

Transformational reasoning that convinced Alice was detected in visual contexts;

however, we suspected that it had also potentially impacted her judgment in other

contexts. For example, it was found that she realized the advantage of algebraic

representation in Problem A, where she claimed that “you can plug in any number, and
165
you’ll always, like, get the same number on both sides.” She believed A2 (algebraic) was

more convincing than A1 (inductive) since A1 didn’t prove the conjecture was true for all

cases. However, at the same time, she needed to plug in some numbers to verify if the

formula used in A2 was true. A possible explanation was that through plugging in the

numbers in the formula she might have detected some patterns that would transfer to

other situations as well, and consequently the formula became valid to her in general

cases. If this conjecture is true, we may claim that Alice found transformational reasoning

convincing in both visual and numerical contexts.

Note that Alice didn’t explicitly indicate a preference for relying on

transformational reasoning. Such a scheme was detected by extracting the similarity of

the comments she made. This revealed that Alice was not yet able to explicitly reflect on

reasoning of arguments (i.e. the link between evidence and conclusion).

4 of Alice’s comments were coded “P.” Similar to what was detected in the cases

of Allen and Abby, Alice also suggested the need for simplicity of an argument to be

convincing to her. For example, she claimed that D2 was less convincing since “it has

more than one step, which makes it kind of harder to do.” In addition, it was detected that

her own way of approaching a problem had an impact on her judgment of the arguments

given. This impact was most obvious when she was working on Problem E, where she

claimed that she didn’t consider any of the arguments convincing since she had “a

different way of finding the answer than all of the arguments.” In explaining her own

method, Alice also started with specific numbers. However she decided to give up that

approach after a few trials. What she had done didn’t seem to be different from what was

166
suggested in E1 (inductive). So it seemed that she didn’t understand what was offered in

E1.

Based on Alice’s comments when explaining her rankings, Figure 23 was

generated to illustrate her rationale for evaluating mathematical arguments. It is

suggested that Alice was likely to be convinced by arguments that were simple enough

and used approaches that were familiar to her. In particular, arguments that utilized visual

examples and engaged transformational reasoning seemed to be the most convincing type

to her.

Transformational
Examples
Convincing
arguments
Easy to understand (Visual)
Familiar procedure

Figure 23. Illustration of Alice’s rationale for evaluating mathematical arguments

The case of Amy

Amy was an 8th grade student enrolled in an Algebra I class at the time of data

collection. In her responses to SMR, the algebraic arguments were indicated as the closest

to how she would argue in all but Problem C, where she selected C1, the inductive

argument. Based on this result, we believed that Amy had exhibited preference towards

167
algebraic arguments. Therefore, she was considered to be a representative from the

consistent group.

Amy’s interview responses are summarized in Table 20 and Table 21. Table 20

illustrates the rankings provided by her for each problem. Column One of the table

represents the order of problems that she tackled. Table 21 summarizes Amy’s comments

when articulating why she found certain arguments convincing or not convincing (The

coding of each comment is explained in Table 9). These two tables served as the major

resource for the interview analysis.

Most convincing -------------------------> Least convincing


Problem C C3 (visual) C1 (inductive) C4 (perceptual) C2 (algebraic)
Problem B B1 (inductive) B3 (algebraic) B4 (visual) B2 (perceptual)
Problem A A2 (algebraic) A4 (visual) A3 (perceptual) A1 (inductive)
Problem D D2 (algebraic) D4 (visual) D3 (perceptual) D1 (inductive)
Problem E E4 (algebraic) E3 (perceptual) E2 (visual) E1 (inductive)

Table 20. Rankings of arguments provided by Amy

168
Positive Comments Negative Comments
Problem C
That’s kind of how I thought of it in my When I think of stuff, I don’t put it in like,
head. (P) diagram and shape form, I kind of just
think of it as just, like, stuff in my head, I
don’t think of any shapes or any examples
to it, and so I sometimes, they confuse me,
this at first confused me when I looked at
it. (E2-, E3, R1-)
It has the diagram of the pictures, which It’s got too much numbers in it, and so it
just makes sense in my head, and it has gets it confused in my head, so I have to
different cases, so that it just, there’s reread it in my head a couple times. (E2-,
different things to show instead of just one R3-)
example, there’s more. (R1, E2, L3)
Showing more examples makes it more It just slows me down mostly. (P)
convincing. (E2, L3)
It made more sense and it seemed more It doesn’t really, it starts with the b, and it
valid to me. (P) doesn’t explain the a and c in the first part,
and I think it probably should explain the a
and c, it just explains the b. (E2)
The picture confused me a little bit. (E2-,
R1-)
It’s not very detailed. (P)
This one’s too detailed, and not really,
completely true, and that one’s not really
detailed enough; the picture shows it, but
like I said, sometimes pictures get me lost.
(R1-, P)
Problem B
This person used more examples, they did It’s just one example, and so it’s not
more of, I guess, trials… and so, it’s more necessarily true because it just, it may not
likely to be true for this one than for other be a hundred percent true. (L3-)
ones. (E2, L3)
It has a whole formula. (E4, R4) It has more information to it, but I, it
confused me a little bit. (P)
It looks pretty true. (R1) I don’t think it shows it, because it says
many and several. (L3-)
continued

Table 21. Summary of comments made by Amy

169
Table 21 continued
Positive Comments Negative Comments
It used an actual formula. (E4, R4) This is only, it just has one example, and it
doesn’t have anything other than one
example to back it up. (E2, L3-)
I don’t exactly think this is even correct,
because it says BQ is equal to BD, and I
might have been misunderstanding it
wrong, but it just doesn’t look equal to it,
it doesn’t seem very equal… after that, it
kind of just lost me, because I was just
like, well this isn't equal, so the rest of it
doesn't seem very true either. (E2, R1)
Problem A
You can put any number in there, and it This one is just shown by pictures. (R1-)
wouldn’t make a difference. (E4, R4)
It has the formula. (E4, R4) With pictures, like, sometimes it can be
incorrect, or not true for some things. (R1-
)
I think with a formula, it makes it true for The person just tried a couple different
any event. (E4, R4, L6) things; I mean, they might have tried a lot,
but they didn’t try all of them, which is
important. (E2-, L3-)
Most people like thinking math with food
and whatnot, once you get into food, it just
completely loses me. (E3-, L2-)
It’s talking about cookies… I just can’t
picture that in my head. (E3-, L2-)
Problem D
It’s got the formula, and it uses x instead It doesn’t necessarily say the formula, and
of an actual price, so it can be any number, so I don’t one hundred percent know
and the formula is correct. (E4, R4, L6) exactly what formula was used, in my
head. (E4, R4)
It’s kind of like using this formula, just This one has actual prices instead of x, and
putting it onto a graph instead. (E4, R1, so even though they use different prices,
R4) it’s not always true, because they can’t use
every number… and so it’s just, you can’t
tell from that one if it’s 100 percent true or
not. (E2-, R4, L3-)
It shows the formula, and so I’m, it’s They just don’t have a lot of stuff to back
really clear on what they’re doing. (E4, it up. (E6-)
R4)
continued

170
Table 21 continued
Positive Comments Negative Comments
It doesn’t have much information to back
it up that it’s true, so it’s not as clear. (E6-)
I would probably add to it that, something
about the actual, something about the
before price and the after price, instead of
just the 20 dollars and the 5 percent. (E2)
It would be better if it had, like, an x for an
actual price… instead of just showing the
tax difference. (R4)
Comparing Problems A-D
I am pretty sure that all rectangles are I don’t think the diagram… the picture, I
similar. [making rectangles using her don’t think it goes with this… yeah, with
fingers] (E2, L4) the description. (E2-, R1-)
They kind of have a formula here, and then You can’t necessarily go with the picture,
they back it up with different examples. because the picture doesn’t show all cases.
(E2, E4, R4) (E2-, R1-)
It [my brain] likes more figures and In your brain, your brain can just skew
numbers. (E2, R1, R3) everything if you just have one missed
piece of data or anything. (E3-, L2-)
Numbers are a lot simpler than trying to My brain doesn’t like to connect to
think of something in my head. (E2, R3) imaginative stuff. (E3-, L2-)
Those just don’t convince me as much as
numbers and something that I can actually
see on a piece of paper. (E2, E3-, R3)
It just shows a couple cases, not the whole
range of cases, ’cuz there could be
basically any number, there could be tons
of different things it could be. (E2-, L3-)
Problem E
It has more than one case, and it has It has one case, instead of all the, however
variables, so you can put anything into it, many, amount of cases. (E2-, L3-)
and so it will be true for anything, instead
of just one thing. (E4, R4, L6)
It’s got a picture instead of a number, and
the pictures can be misinterpreted, or
mismade. (E2, R1-, R3)
It doesn’t have any pictures or numbers, it
just has words to back it up. (E2, R1, R2-,
R3)
continued

171
Table 21 continued
Positive Comments Negative Comments
[It was not backed up by] numbers and
objects. (E2, R3)
Additional Comments
The numbers, they seem to be right, but
they don’t really show anything else. (E2-,
L3-)
They say it in words instead of in numbers.
(E2, R2-, R3)

As shown in Table 20, Amy considered algebraic arguments as most convincing

in 3 of the 5 problems (i.e. Problems A, D, and E), while the inductive arguments were

rated least convincing in the same three problems. The visual and perceptual arguments

were ranked between the algebraic and inductive arguments. This general preference for

algebraic arguments was consistent with her responses in the SMR. However, Amy’s

rankings for the arguments in the other two problems were different. In Problem C, she

considered the visual argument as the most convincing while the algebraic argument

received the least convincing ranking. In Problem B, the inductive argument was ranked

the most convincing while the perceptual argument was considered the least convincing.

In order to better understand how Amy evaluated the proposed arguments and her

rationale when providing these rankings, the coding for her explanations in Table 21 were

summarized in Table 16 so to identify factors and features of the arguments that had

influenced her judgment.

172
Total number of references to representation: 37
Visual Narrative Numerical Symbolic
Positive 6 0 7 13
Negative 8 2 1 0

Total number of references to evidence: 46


Authority Example Imaginary Fact Assumption Opinion
Positive 0 16 1 12 0 0
Negative 0 10 5 0 0 2

Total number of references to link: 19


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 0 3 0 0 3
Negative 0 4 8 1 0 0

Table 22. Categories of comments made by Amy

As shown in Table 16, the total number of comments that focused on the

representation, evidence and link of the arguments were 37, 46, and 19, respectively,

indicating that all three factors had impacted her evaluation. These also indicated that

much of Amy’s explanation was based on the features of the arguments instead of her

personal opinions.

There were three key findings that made Amy special. First she was the only

subject who had clearly and repeatedly emphasized her preference toward algebraic

arguments and made explicit claims about the logical rigidity of these arguments.

Mathematical facts and symbolic representation were referenced 12 and 13 times

173
respectively when she was talking about factors that convinced her. She made these

statements were made when justifying the rankings she provided for Problems A, D, and

E. In particular, she claimed that A2 (algebraic) had “a formula; it makes it true for any

event;” D2 (algebraic) “got the formula, and it uses x instead of an actual price, so it can

be any number, and the formula is correct;” and E2 had “variables, so you can put

anything into it, and so it will be true for anything, instead of just one thing.” Her

explanation demonstrated that she was not only attracted by symbolic format, but also

understood their rigidity in logic. This was explicitly addressed 3 times. As a natural

consequence of this realization, she had also repeatedly addressed the deficiency of

inductive and perceptual reasoning (4 and 8 times, respectively). This was exemplified by

her claims that “they might have tried a lot, but they didn’t try all of them, which is

important,” “this one has actual prices instead of x, and so even though they use different

prices, it’s not always true, because they can’t use every number” and “your brain can

just skew everything if you just have one missed piece of data or anything.”

Second, she was the only subject who clearly described the disadvantage of visual

illustrations, which was not about any specific image or graph, but about visual

illustration as a way to reason. Such claims include “you can’t necessarily go with the

picture, because the picture doesn’t show all cases” “it’s got a picture instead of a number,

and the pictures can be misinterpreted, or mismade,” and “with pictures, like, sometimes

it can be incorrect, or not true for some things.” This point was addressed 8 times during

the interview.

The third finding was that although the previous two results consistently appeared

in her explanation in the number theory, algebra, and probability problems, they were not
174
present when she was working on the two geometry problems. This is a good example to

illustrate how context may impact students’ reasoning method. If a reasoning test was

based on Problems A, D and E, Amy should be considered as one who demonstrated the

highest level of maturity in mathematical reasoning, especially among 8th graders. So the

question is why she evaluated arguments in geometry contexts differently.

One reason could be that Amy tended to avoid working on visual representations

in Problems A, D, and E since she believed they might misrepresent the content.

However, in geometry problems she had to work on images and figures. Amy’s

explanation for considering B1 (inductive) as the most convincing option in Problem B

further revealed her thinking in geometric context. When asked to compare B1 to the

inductive arguments in other contexts, Amy suggested that B1 was different because the

cases used in B1 were not numbers but rectangles. She further claimed that “all

rectangles are similar” shapes that shared common properties such as “equal opposite

sides,” and hence if the claim that “diagonal is longer than the sides” was true for some of

them, it should apply to others as well (While stating this, she used her fingers to made a

rectangle and made a movement to represent the adjustment of side lengths). This

explanation revealed that Amy utilized transformation to convince herself that B1 did

account for all cases. Similar strategy applied to her judgment in Problem C, where C3

(visual) utilized transformation. This argument was rated most convincing since she

believed it demonstrated that the conjecture was true for all cases. Further examination of

Amy’s judgment of the algebraic argument in the two geometry problems revealed that

she didn’t understand the algebraic argument in Problem C and hence considered it the

least convincing. She was convinced by B3 (algebraic) but rated it low because it
175
confused her slightly at the beginning. Supported by this evidence, we believe the

following three points capture Amy’s major rationale when judging whether

mathematical arguments were convincing.

First, in order for an argument to be convincing to Amy, it must show the

conjecture is true in all cases. This perception served as the primary guiding principle for

her judgment of arguments in all 5 contexts.

Second, she found testing a few numbers to helpful to understand a problem better;

however, she believed that algebra was the reliable tool to guarantee the general validity

of an argument. She didn’t consider other reasoning methods, in particular induction,

perceptual connection, and visual illustration, as reliable, and suggested they each had

their own deficiencies.

Deductive
Transformational Examples, Facts
Convincing
arguments (Symbolic,
True for all cases Numerical)

Figure 24. Illustration of Amy’s rationale for evaluating mathematical arguments

Third, she considered different numbers as separate cases but she viewed a group

of geometric shapes that share certain common properties as related cases. Therefore,

examples in some geometry context were viewed as generic examples and

transformational reasoning could be utilized to extend the validity of a detected property

176
to other cases. However, examples in numerical contexts were viewed as isolated

instances, hence their property might not hold in other situations. Amy’s rationale was

illustrated in Figure 24.

The case of Beth

Beth was an 8th grade student enrolled in an Algebra I class at the time of data

collection. When working on the SMR, she tended to prefer A4 (visual) B2 (perceptual)

C2 (algebraic) and D4 (visual) in respective problems and hence she was considered to be

a representative from the inconsistent group.

Beth’s interview responses are summarized in Table 23 and Table 24. Table 23

illustrates her rankings for each problem. Column One of the table represents the order of

problems that she tackled. Table 24 summarizes Beth’s comments when articulating why

she found certain arguments convincing or not convincing (The coding of each comment

is explained in Table 9). These two tables served as the major resource for the interview

analysis.

Most convincing -------------------------> Least convincing


Problem B B2 (perceptual) B4 (visual) B3 (algebraic) B1 (inductive)
Problem D D4 (visual) D1 (inductive) D2 (algebraic) D3 (perceptual)
Problem A A3 (perceptual) A2 (algebraic) A4 (visual) A1 (inductive)
Problem C C4 (perceptual) C2 (algebraic) C1 (inductive) C3 (visual)
Problem E E1 (inductive) E4 (algebraic) E2 (visual) E3 (perceptual)

Table 23. Rankings of arguments provided by Beth

177
Positive Comments Negative Comments
Problem B
I’ve been on a football field, so I know I got really confused. (NA)
what the shape is and everything, so if I
imagine to myself I’m standing at the
corner of a football field, like that says,
and I’ve had to run football fields, and
they’re called the suicide thing, so I had to
run that way, and then those two ways, and
that one was longer than those two when I
was running. (E3, L2)
It’s also because of a relatable thing, I, I guess it would be more convincing if I
like, I understand what it means when it knew what the actual numbers were, if
says, I can picture a rectangle being drawn, they actually use the actual numbers in
plus I’ve measured rectangles, so that’s them, and not just like, saying the square
longer than the two sides. (E3) of BD, if they actually put the actual
numbers. (E2, R3, R4-)
I just know more about B2, I’ve run the We’re probably not going to have rulers
football field before, so I guess that’s why. during the test, so it’s going to be harder.
(E3, L2) (P)
You can kind of look at the side lengths
and see what they mean by it, instead of
having to measure it and everything. (E2,
R1)
Problem D
D1 gave a little bit more of an explanation It makes sense, it’s just really short, and
at the end, and also just like on that one they don’t really give a lot of examples.
[points to previous question], they used (E2)
actual numbers, so even though it wouldn’t
really probably be that hard for me to
insert a number in there during the test,
that one's already done for me, so it's
probably a lot easier to do. (E2, R3, P)
I could insert the 200 dollars and the 500 They don’t give you examples of numbers
dollars that he’s suggesting is the same that fit into it really, they just… I guess
thing, and I could see if it was actually yeah, they just don’t give you numbers to
right. (E2, L3) support themselves, their claims. (E2, R3)
continued

Table 24. Summary of comments made by Beth

178
Table 24 continued
Positive Comments Negative Comments
It just has an illustration, and I’m
sometimes, most of the time, I’m a visual
learner, so it helps a lot to see it and read
what it says, and it is, it makes sense. (R1,
P)
It only gives one example, but it also
offers 200, 500 if you wanted to insert
them, so yeah, I think it does support that.
(E2, L3)
Problem A
I can kind of imagine someone having six It confused me the first time I read it, and I
cookies in… having a multiple of six I had to re-read it, because I wasn’t really
imagine 36 because that’s the square I sure what it meant by the uh, when it was,
guess, square root or whatever, and um, so the way it was divided and everything.
I imagine 36 and I imagine six boxes of 36 (NA)
cookies and dividing each into two and
then there's three cookies in each, so… and
then you can just, you can put the three
cookies with the two boxes of three
cookies, you can put it back into one box
of six, and it's still a multiple of 36 either
way. (E3)
You can insert a number in there… and it I’m guessing that they’re doing what I
would make sense. (E2) think they’re doing. (NA)
It’s visual, so it’s a lot easier for me to It says that she’s tried plenty of multiples
understand when it’s visual. (R1, P) of six, and three as well, and that they’re
the same, but just ’cuz she’s tried a lot of
them, she hasn’t tried all of them, so you
could never really know, based on that
statement, if she was right or not. (E2-, L3-
)
Even if you just try a wide range of
numbers, you still, you never know. (E2-,
L3-)
Problem C
I can visualize that, and ’cuz I can think You’ve tried many cases, but you can
about it in my head. (E3, L2) never be sure, ’cuz you haven’t tried all
the possibilities, which really, you could
never do anyways. (E2-, L3-)
continued

179
Table 24 continued
Positive Comments Negative Comments
I like they way that C4 is explained better; It says that she shortened the sides, but it
I like being able to imagine it, or being doesn’t say by how much, so she could
able to think… ’cuz actually, I thought the have shortened the sides at any, she could
area of this table surrounded by wire. (E3, have shortened a more than she shortened
L2) b or more than she shortened c, so she
doesn’t really say how much to shorten it
by. (E2)
It’s easier… to imagine. (E3, L2) I think that to make the claim more
believable, you would have to cut all the
sides by the same length, we would have
to cut the sides at the same length from
each side. (E6)
Comparing Problems A-D
I still like that just because of the graph, You yourself would only be inserting a
and I can look at it and kind of understand certain amount of numbers, you wouldn’t
what they’re saying and everything. (E2, be sitting there inserting every single
R1) number in the world. (L3-)
It’s graphed with the two lines, and it You get to trust your answers, you don’t
shows that they’re all, that it’s one unit have to trust their answers, but you also
apart, and if you wanted to, you could kind are limited to a certain number of
of check that with all of them, they’re all numbers, so you can’t… it’s kind of like,
one unit apart and make sure that it was half and half, good and bad. (NA)
one unit apart the whole time like they said
it was. (E2, R1)
It’s relatable for me. (P)
I’m having the football field switch in my
mind, and every rectangle that I can think
of is, it works. (E3, L4)
I can imagine them in my mind, I can
picture them. (E3, R1)
It’s more visual. (R1)
If you use them [variables] you can insert
numbers, any number that you possibly
want, and even if you wanted to insert
numbers just to see if they were wrong…
(R4)
You can insert whatever numbers you
want, you don’t have to go by what they’re
saying as much. (R4)
continued

180
Table 24 continued
Positive Comments Negative Comments
Problem E
If you take 2 out of 5, and you have 4 out it [the narrative description] doesn’t really
of 10… if you take 4 out of 10, it would support what they’re saying, it kind of just
reduce to 2 out of 5, which is the same doesn’t support this; it “unsupports” it not
percent, so that’s why it makes sense. (E2, making sense; it doesn’t really support it.
L3) (E6-)
It shows that they’re the same ratio, You can never really try all the numbers.
they’re still proportionate. (E2, L3) (L3-)
Additional Comments
It gives more information about the
illustration. (R2)

As shown in Table 23, Beth identified the perceptual arguments as the most

convincing in Problems A, B, and C but least convincing in the other two problems.

Algebraic arguments were never considered the most or least convincing. Beth’s

evaluation of visual and inductive arguments were highly inconsistent across the

problems and appeared at different places on the lists. In order to better understand how

Beth evaluated the proposed arguments and her rationale when providing these rankings,

the coding for her explanations in Table 24 were summarized in Table 25 so to identify

factors and features of the arguments that had influenced her judgment.

181
Total number of references to representation: 14
Visual Narrative Numerical Symbolic
Positive 7 1 3 2
Negative 0 0 0 1

Total number of references to evidence: 27


Authority Example Imaginary Fact Assumption Opinion
Positive 0 13 9 0 0 1
Negative 0 3 0 0 0 1

Total number of references to link: 15


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 5 4 1 0 0
Negative 0 0 5 0 0 0

Table 25. Categories of comments made by Beth

As shown in Table 25, the total number of Beth’s comments that focused on the

representation, evidence and link of the arguments were 14, 27, and 15, respectively. It

was found that Beth made more comments about the evidence of the arguments than the

representation and link. Among the types of evidence, examples (i.e. results from an

immediate test) were the most frequently referenced. Specifically, 13 were the results of

immediate tests, mostly by plugging in numbers (e.g. “you can insert a number in there”).

There were 9 cases where imaginaries from past experience (e.g. “I just know more about

B2, I’ve run the football field before”) were recalled to decide whether arguments were

found convincing. Formulas and theorems were not treated as reliable sources of

182
evidence to Beth. In order for them to be convincing, she needed to plug in numbers to

verify.

Beth also commented on the impact of representations on her evaluation. Most

prominently, she claimed that “most of the time, I’m a visual learner,” and an argument

was “a lot easier for me to understand when it’s visual.” Note that by “visual” she didn’t

only mean visualizing something that was drawn on paper, but also visualizing something

in her mind, i.e. imagining some model. She didn’t distinguish between these two types

of visualization in her explanations. Overall, there were 7 times when Beth mentioned

that arguments with visual illustration contributed to her conviction. In addition, Beth

recognized the value of numerical expression in offering her concrete example to support

a claim. She acknowledged the value of symbolic expression in allowing her to test

numbers that she wanted to check. However, she thought that neither of the expressions

was powerful enough to show that the conjecture was true in all cases. This was further

explained in her view about the link between evidence and conclusion.

Beth didn’t believe that an algebraic argument could prove a conjecture was

always true. Compared to numerical expressions, the symbolic formulas only offered the

advantage that “you can insert whatever numbers you want, you don’t have to go by what

they’re saying as much.” Despite this, she sometimes preferred numerical expressions

since “they used actual numbers, so even though it wouldn’t really probably be that hard

for me to insert a number in there during the test, that one's already done for me, so it's

probably a lot easier to do.” Beth considered an argument to be more convincing if she

“knew what the actual numbers were, if they actually use the actual numbers in them, and

183
not just like, saying the square of BD.” Therefore, an algebraic expression was not

meaningful to her unless the variables were substituted by actual numbers.

Beth’s evaluation of inductive arguments were not consistent across the problems.

On the one hand, she explicitly pointed out that trying a few cases was not sufficient to

show a conjecture is always true. For example, in commenting on A1, she claimed that

“she’s tried a lot of them, she hasn’t tried all of them, so you could never really know,

based on that statement, if she was right or not.” Similar statements were articulated 5

times during the interview. However, when she was evaluating B2, even though she

realized that football field only represented a certain type of rectangle, she still

considered it as the most convincing one since she could “relate” to it. Similar situation

occurred in her ranking of arguments in Problem D, where she considered D1 (inductive)

the second most convincing, admitting that it couldn’t prove the conjecture was always

true. This suggested that being able to show the general validity of a conjecture was not a

required condition for Beth when considering an argument convincing. Other personal

factors played a more important role.

An examination of the personal standards Beth discussed during the interview

revealed her need to see simple, “relatable” and easy to access arguments in order for her

views to be convinced. Similar opinions were expressed 4 times. While “general validity”

contributed to the reliability of an argument (e.g. her comments on A1), it was not the

single decisive factor.

This explained Beth’s preference for perceptual arguments (A3, B2, and C4) in

Problems A, B and C (see Table 23), since the contexts provided in those argument

evoked familiar experiences and hence were most “relatable” to her. In contrast, the two
184
perceptual arguments (D3 and E3) in Problems D and E didn’t provide any “relatable”

scenarios and hence were considered less convincing.

Based on Beth’s comments when explaining her rankings, Figure 25 was

generated to illustrate her rationale for evaluating mathematical arguments. It was

suggested that Beth was likely to be convinced by arguments that were “relatable” to her

existing experience. In particular, arguments that create a scenario that can be visualized

by her seemed to be the most convincing type. Additionally, illustration of various

examples could help her access a problem and hence contributed to her conviction.

Perceptual Examples,
Convincing Imaginaries
arguments Easy to understand,
(Visual, Narrative)
Relatable scenario

Figure 25. Illustration of Beth’s rationale for evaluating mathematical arguments

The case of Betty

Betty was an 8th grade student enrolled in an Honor’s Algebra I class at the time

of data collection. In her responses to SMR, she considered the visual argument (A4) in

Problem A, the perceptual argument (B2) in Problem B, the algebraic argument (C2) in

Problem C, and the inductive argument (D1) in Problem D as the most appealing option

185
in each context. Since she exhibited preference towards different types of argument

across the context, she was considered to be a representative from the consistent group.

Betty’s interview responses are summarized in Table 26 and Table 27. Table 26

illustrates the rankings provided by her for each problem. Column One of the table

represents the order of problems that she tackled. Table 27 summarizes Betty’s comments

when articulating why she found certain arguments convincing or not convincing (The

coding of each comment is explained in Table 9). These two tables served as the major

resource for the interview analysis.

Most convincing -------------------------> Least convincing


Problem D D1 (inductive) D2 (algebraic) D4 (visual) D3 (perceptual)
Problem C C2 (algebraic) C3 (visual) C1 (inductive) C4 (perceptual)
Problem A A2 (algebraic) A3 (perceptual) A4 (visual) A1 (inductive)
Problem B B3 (algebraic) B4 (visual) B1 (inductive) B2 (perceptual)
Problem E E1 (inductive) E4 (algebraic) E2 (visual) E3 (perceptual)

Table 26. Rankings of arguments provided by Betty

186
Positive Comments Negative Comments
Problem D
When they explain it and show the work It wasn’t enough work to show how they
[examples] that that’s right. (E2) got a dollar off. (P)
This one is, like, no work at all. (P)
It just, like, gives you a graph and doesn’t
explain how they formed the graph and
like, how they got from the five percent to
a dollar. (R1-, R2)
Problem C
The statement they made is true; they said It [the perceptual argument] just states that
the area of a triangle equals half of the they’re larger. They have no idea what
product of its base and height, and that’s they’re talking about. It’s just, like, blank.
true. (E4) (E6-)
That [the formulas] describes how they They basically just stated that it’s larger.
found out the answer. (E4, L5) (E6-)
They diagramed the triangle part ... If you They wouldn’t even give any, like, work
cut it, you make it smaller. (E2, R1, L4) [in addition to the examples] ... They
didn’t explain why. (E2-, L3-)
That helps to see the actual work [formula
and related procedure] being done of how
to get the answer. (E4, R4, L5)
Problem A
It gives you an equation to solve for n, and It doesn’t really explain, it just breaks up
it comes out correct. (E4, R4) the pattern, like, the blocks. (R1-, R2)
It used cookies as an example. (E3, L2) It just says that, like, this can be an
opinion. (E6-)
It [A2] explained more of how to find the You didn’t go further in the numbers. (R2-
way… to get the answer. (E4, R4) , L3-)
You didn’t look for… like, multiples of
three and six, to see if, to compare them, to
see if they’re the same. (E2)
That’s not enough, I think they just, like,
picked random numbers. (E2-, L3-)
Problem B
I looked at the length of explanations. (P) It’s just, like, an opinion. (E5-)
They divided the rectangle ... then the It’s just, too plain… they didn’t even dig
Pythagoras Theorem ... (E4) deep and explain what they did. (E6-)
continued

Table 27. Summary of comments made by Betty

187
Table 27 continued
Positive Comments Negative Comments
They are all radius [so they are equal]. There are small football field, and big one,
(E4) say NFL ... so that’s not true for all
football fields ... the size varies (E3, L3-)
They showed the length [pointed on the I think they (the diagonal and the side) are
figure]. (E2, R1) the same size. (E6)
They are true in their cases. (E2)
Comparing Problems A-D
It explains more, they give you a problem They [inductive arguments] just give you
[example] for you to find the solution to the statement, it’s not really explanations
get the answer, to see if it’s right. (E2, P) of how they found it. (E6-)
You gotta work through the problem to get It’s okay to draw a picture, but you have to
the answer. (P) explain the picture too, and they didn’t
really explain it… as well as algebra
would. (R1-, R2, R4)
Algebra, it explains it more than just
saying, just making a statement, and they
give you equations and inequalities, and
problems to find the solutions to get your
answer, rather than just making a
statement. (E4, R4, P)
Problem E
It explains how they get through to use They just drew a picture, and they didn’t
percentages and ratios. (E2, R3, P) really explain it, they just basically said
that the ratio of two ping pong balls would
be the same, therefore they won’t change.
(E6-, R1-, R2)
They use algebra, and it like, and they use It just makes a statement. (E6-)
variables to explain how they got the
answer. (R4)
They give you a percentage. (E2, R3) It just gives you algebra for you to solve
it… it basically just, it isn’t as good as
[points at inductive argument]. (R4-, R3)
There’s more math involved here
[inductive argument]. (E2, R3)
Additional Comments
If you explain how you found it. (R2, P)
How you found the answer to the problem
being asked. (P)
You have to find, go… dig it further, take
further steps. (P)

188
As shown in Table 26, Betty considered the algebraic arguments most convincing

in Problems A, B, and C, and second most convincing in Problems D and E,

demonstrating a consistent preference toward algebraic argument. In addition, she

considered the perceptual arguments the least convincing options in 4 of the 5 problems,

also providing consistent evaluations toward this type of arguments. Therefore, although

Betty was selected as a representative of the inconsistent group, she exhibited more

consistent judgment of certain types of argument during the interview phase. To better

understand Betty’s rationale when providing these rankings, the coding for her

explanations in Table 27 were summarized in Table 28 so to identify factors and features

of the arguments that had influenced her judgment.

Total number of references to representation: 23


Visual Narrative Numerical Symbolic
Positive 3 5 4 6
Negative 3 1 0 1

Total number of references to evidence: 30


Authority Example Imaginary Fact Assumption Opinion
Positive 0 9 2 8 0 1
Negative 0 2 0 0 1 7

Total number of references to link: 8


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 1 0 1 2 0
Negative 0 0 4 0 0 0

Table 28. Categories of comments made by Betty


189
As shown in Table 28, the total number of Betty’s comments that focused on the

representation, evidence and link of the arguments were 23, 30, and 8, respectively,

indicating that all three factors had impacted her judgment.

Betty’s explanation revealed that she could be convinced by arguments that

utilized symbolic and numerical representations. Each of these was mentioned 6 and 4

times, respectively, during the interview. For example, she stated that “it [the argument]

explains how they get through to use percentages and ratios” and “algebra, it explains it

more than just saying, just making a statement, and they give you equations and

inequalities, and problems to find the solutions to get your answer, rather than just

making a statement.” These statements helped to explain her rankings, where the highest

ranked arguments were written in either symbolic or numerical format. However, Betty

didn’t consider visual illustrations convincing except in the two geometry problems.

Although she did rely on visual evidence in the two geometry problems (e.g. she needed

to visually compare the length of two line segments), she didn’t consider reliance on

visual illustrations a convincing way to validate the conjecture in the other three

problems. She suggested that “it’s okay to draw a picture, but you have to explain the

picture too.” Similar opinions were repeated 3 times. Therefore, she didn’t believe simply

showing the graphs and figures without robustly unpacking their meanings made an

argument convincing. This explained why she didn’t consider visual arguments

convincing in problems that didn’t involve geometry content. A need for narrative

description (to explain examples or pictures) was mentioned 5 times. However, it seemed

that arguments with only narrative representations were also not convincing to her. This

can be illustrated by Betty’s comment on C4 (perceptual), “it [the perceptual argument]


190
just states that they’re larger. They have no idea what they’re talking about. It’s just, like,

blank.”

Betty also found that the evidence provided in an argument contributed to its

validity. In particular, she considered facts (i.e. known mathematical results) and

examples (i.e. results from an immediate test) as reliable sources to establish validity of

an argument, each of which was referred 8 and 9 times. In the two geometry problems

she recognized the validity of the triangle area formula and the Pythagoras Theorem, both

of which made the corresponding arguments convincing to her. She also examined

particular shapes drawn on the paper. In the other problems, the numerical examples

served as primary source of evidence and she even added her own calculations to verify a

few statements. She also perceived arguments that built on imaginaries (football field and

cookie bags) convincing.

Despite this, Betty didn’t think that merely checking a few examples made an

argument convincing. She mentioned this point 6 times during the interview. For instance,

she commented on A1 (inductive) that “that’s not enough, I think they just, like, picked

random numbers.” When evaluating B2 (perceptual), she claimed that “there are small

football field, and big one, say NFL ... so that’s not true for all football fields ... the size

varies.” These comments revealed that she was able to see the differences among various

examples and had realized some properties might not be generally applicable. However,

she considered the inductive arguments in Problems D and E the most convincing option.

She suggested that these arguments “explain it and show the work that that’s right” and

“explains how they get through to use percentages and ratios.” In these cases, whether an

argument was valid in general cases was ignored. To investigate why these seemly
191
contradictory behaviors happened, we sought explanation from Betty’s personal standards

for her judgment.

It was found that Betty repeatedly emphasized the need for “explanations.” The

terms “explain” and “explanation” appeared a total of 16 times in her comments. In

addition, the need to see more “work” was addressed 6 times. This was highlighted by

her comments that “you have to find, go… dig it further, take further steps.” While it was

difficult to understand what exactly she meant by merely considering segments of the

interview, it became more sensible when taking into account the entire interview. In fact,

Betty could be convinced by a variety of explanations. It could be a description about a

picture, an examination of a few concrete cases, or an illustration of the calculation

process. When working on Problem B, she suggested that she “looked at the length of

explanations” instead of the content to see which argument was more convincing. We

didn’t believe the length of explanation was the single factor to determine her conviction

(in fact it was not, since A1 was short but considered convincing and D4 was long but

considered not convincing); however it didn’t seem that Betty would prefer any particular

type of explanations. Further analysis of the data revealed that whether the idea of an

argument was explained clearly could be more important to Betty than whether the idea

itself proved the proposed conjecture. This was detected in Problem B, where she

provided a ranking for the arguments but suggested that the conjecture was false and

neither argument could show the conjecture was always true (but B3 (algebraic) and B4

(visual) were still more convincing to her since they were “true in their cases”). A similar

situation occurred in her work on Problem A, where even after she had provided a

ranking for the four arguments, she was still unsure if the conjecture was true or false.
192
Therefore, we believe Betty had a personal standard of what a “convincing” argument

meant. To her, the “convincingness” of an argument was first determined by how much

detail the argument offered in order for her to understand the information, and the

purpose of the argument (i.e. to justify the general validity of a conjecture) seemed to be

less important.

Combining Betty’s personal scheme and her comments on the representation,

evidence and link of the arguments, Table 11 was created to highlight her rationale when

evaluating mathematical arguments. Betty’s interview responses suggested that she was

more likely to be convinced by explanations that were rooted in concrete examples and/or

known mathematical facts and developed through multiple steps of transformation or

perceptual connections.

Ritual
Perceptual
Transformational Examples, Facts
Convincing
arguments (Symbolic,
Detailed procedure Numerical)

Figure 26. Illustration of Betty’s rationale for evaluating mathematical arguments

The case of Blake

Blake was enrolled in an Integrated 8th grade Mathematics class at the time of

data collection. When working on the SMR, he had selected A2 (algebraic), B2

193
(perceptual), C1 (inductive) and D3 (perceptual) in respective problems as the most

appealing option and hence he was considered to be a representative from the

inconsistent group.

Blake’s interview responses are summarized in Table 29 and Table 30. Table 29

illustrates the rankings provided by him for each problem. Column One of the table

represents the order of problems that he tackled. Table 30 summarizes Blake’s comments

when articulating why he found certain arguments convincing or not convincing (The

coding of each comment is explained in Table 9). These two tables served as the major

resource for the interview analysis.

Most convincing -------------------------> Least convincing


Problem B B2 (perceptual) B1 (inductive) B3 (algebraic) B4 (visual)
Problem C C4 (perceptual) C1 (inductive) C3 (visual) C2 (algebraic)
Problem A A1 (inductive) A2 (algebraic) A3 (perceptual) A4 (visual)
Problem D D3 (perceptual) D1 (inductive) D2 (algebraic) D4 (visual)
Problem E E2 (visual) E1 (inductive) E3 (perceptual) E4 (algebraic)

Table 29. Rankings of arguments provided by Blake

194
Positive Comments Negative Comments
Problem B
It’s a little bit more simple. (P) A little bit too math-like, they’re not even
thinking about the problem, they’re not
even talking about it, they’re just trying to
make you all confusing with different,
they’re trying to make it different formulas
so you can get all confused about this. (E4-
, R4-, P)
This gives you more of a visual type thing A little bit too much work here for you to
so you can actually understand it, so you understand. (P)
can imagine how that would actually work.
(R1, E3, L2)
You just gotta try to figure it out on your They’re just saying it in like, math, and
own. (P) why not just plainly say, like, it’s longer.
(P)
It’s trying to tell you how to think… and
that wouldn’t really work. (P)
I’ve known some people where they
actually want to think on how to actually
figure it out, not just like, OK, here’s ya
how to do it, think this way. (P)
Problem C
It simply says it, it’s making it not too A little bit too math-like, it wouldn’t really
complicated. (P) work. (R4-, R1-, P)
It’s actually giving you not that much One of the second complicated ones,
visualization, but if you combine these two where they’re trying to make you all
[inductive and perceptual] together, then confused. (P)
you would actually get the answer, right
here. (E2, E3, R1-)
The visual aids right here, they actually They just want to try to make you think
help you, but then there’s an explanation that the one triangle equals two, so that it
that goes with that. (E2, R1, R2) would just make you go all over the place,
and see. (P)
It [perceptual argument] gives you the It’s just going all over the place, it just
explanation, but if you added this with it wants you to think something else besides
[points to inductive argument] it would that, so it wouldn’t work. (P)
really help. (E2, R2)
This is trying to mess you up completely.
(P)
continued

Table 30. Summary of comments made by Blake

195
Table 30 continued
Positive Comments Negative Comments
it’s like you’re trying to just blatantly out
say it, how you’re just trying to confuse
us. (P)
It just didn’t make any sense. I thought
that it wasn’t talking about the question at
all. (NA)
They’re trying to trick ya. (P)
Just trying to confuse ya. (P)
It’s talking about it in a college-like term.
(R4-)
Problem A
It has a visual aid like it’s supposed to. They’re all confused about this, because
(E2, R1) it’s like blurry and everything, and it’s
like, oh God, it’s too much. (R1-, P)
This is where it talks about the geometry There’s such thing as too much on these
and everything, this is where I could easily things. (P)
understand it. (P)
It gives you a kind of convincing It wouldn’t explain as much as I wanted it
visualization; it kind of gives it easy. (E2, to. (P)
R1)
If you actually did the math right here with It’s not explaining as much to where you
a calculator, you could actually understand can actually understand it; however it’s
it better. (E2, R3, L5) easier for you to understand; however, it’s
not exactly what you want for an answer;
it’s not explaining as much as you want it
to. (NA)
You’re thinking for yourself. (P) I could try and try, and it just gives me too
much work. (P)
There’s such a thing as too much work;
however, thinking for yourself and
actually trying to find out. (P)
Problem D
It’s doing it a lot more simple, to where They’re missing something in this
you can actually figure it out. (P) question… they don’t tell you what the
price of the bike is, so that where you
could actually find it out easier. (E2, R3)
It actually gives you a visualization. (E2, It makes it a little bit too complicated to
R1) where if you do the wrong thing, it’s
kaput. (P)
continued

196
Table 30 continued
Positive Comments Negative Comments
It just simplifies it for you… it’s so you A lot of kids in my class where it wants
can actually figure it out for yourself. (P) variables, with dividers, with decimals,
and it just completely makes them…
blank. (R4-, R3-)
This one [perceptual] totally simplifies it For a sophomore, this’d be working, but
for you, so you wouldn’t have to do that for an 8th grader… [makes sound of
much work; however, you’re still actually disagreement]. (R4-)
learning math. (P)
It’s more like a word problem. (R2-)
They’re just trying to make it all
complicated. (P)
This is a tricky one they’re trying to pull
on ya. (P)
It’s just telling you on what you should do,
and a lot of people don’t like that. (P)
You need this, this… where’s my opinion
in it? (P)
Comparing Problems A-D
It actually gives you visuals, to where you They’re making it too complicated. (P)
can actually understand it, and where you
can actually try it for yourself. (R1, E2)
If you remember it, you got it. (P) They’re throwing off what is being asked,
so that you can get confused; it’s just like
with assessments, that’s what they want.
(P)
I did understand a little bit of each, They’re trying to trick ya, just like with
because they’re the skills I learned in the assessments, they want you to get the right
past. (E3) answer; however, they’re just trying to
trick you to see if you know it. (P)
I knew immediately what it was saying, They’re trying to make it more
because I’m in 8th grade, I know my complicated so that you can try figuring
geometry. (E3) out… like, oh, wait a minute, that’s wrong,
gotta go over here. (P)
Right here, I understood, because we were Even though they look like visuals, that’s
talking about the Pythagorean theorem a the trap. (P)
ton during that time, so I understand it
more. (E3, E4)
They’re trying to make it look like it’s
easy, but when you try to do it, bam! It’s
wrong… (P)
continued
197
Table 30 continued
Positive Comments Negative Comments
They’re throwing it off the question, they
have little pictures to try and trap ya, but
however, they’re doing it in what I like to
call high school and college terms, to
where they’re trying to make it all
complicated to you, and when you
understand that it's really complicated,
then that's where you need to dodge out.
(R1-, R4-, P)
More complicated terms than what I don’t
understand. (R4-)
This one, we didn’t talk about that much.
(E2)
Because we didn’t talk about it that much,
it was a little bit more complicated to
figure out. (E2, P)
Problem E
I can actually understand ratios a lot. (E2, It doesn’t give you any numbers, doesn’t
R3) give you any visuals. (R1, R3, E2)
This is with a visual, to where I could You’re talking about too many doubles,
actually imagine it. (E3, R1) and you’re just trying to confuse me. (P)
You can actually do the math, and you can There’s nothing concrete about it to where
actually figure it out. (P) you can actually figure it out. (E2, P)
If it’s with two different numbers, just like It doesn’t give you any numbers… 2n? I
on how you did, then it’s easier to figure mean, that doesn’t really work for me… It
out. (E2, R3) ain’t gonna really work if it’s just with
variables. (R4-, R3, E2)
Additional Comments
It’s talking about ratios, to where you can
figure it out. (E2)
I’m more good with trying to figure it out
with a calculator and with numbers. (E2,
R1, L5)
When it’s with ratios… I’m real good with
that. (E2, R3)

198
As shown in Table 29, Blake considered the perceptual arguments most

convincing in Problems B, C and D, but not convincing in the other two problems.

Algebraic arguments were generally not convincing to him, but it was ranked higher in

Problem A. The inductive arguments were considered the most convincing in Problem A,

and second most convincing in the other problems. The visual arguments were considered

the most convincing in Problem E, but not convincing in any other problem. In order to

better understand how Blake evaluated the proposed arguments and his rationale when

providing these rankings, the coding for his explanations in Table 30 was summarized in

Table 31 so to identify factors and features of the arguments that had influenced his

judgment.

Total number of references to representation: 32


Visual Narrative Numerical Symbolic
Positive 8 3 7 0
Negative 4 1 1 8

Total number of references to evidence: 27


Authority Example Imaginary Fact Assumption Opinion
Positive 0 19 5 1 0 0
Negative 0 0 1 1 0 0

Total number of references to link: 3


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 1 0 0 2 0
Negative 0 0 0 0 0 0

Table 31. Categories of comments made by Blake


199
As shown in Table 31, the total number of comments that focused on the

representation, evidence and link of the arguments were 32, 27, and 3, respectively,

indicating that the representation and evidence had a larger impact on Blake’s judgment.

The 3 occasions Blake mentioned the link between evidence and conclusion were about

the perceptual connection in B2, and the use of calculators, which was classified as ritual

operations.

Blake found arguments that were based on examples (i.e. results from an

immediate test) convincing. This was mentioned 19 times during the interview. In

addition, the arguments referencing imaginary were also considered convincing. Besides,

he considered the argument built on the Pythagoras Theorem convincing, which was

classified as a known mathematical fact.

Blake mentioned 8 times that visual illustrations contributed to his conviction, but

there were also 4 times when he indicated the graphs made the arguments less convincing.

When he found visual aid was helpful, he stated that “it gives you a kind of convincing

visualization; it kind of gives it easy.” In other cases where he found the images less

helpful, he suggested they were unclear: “it’s like blurry and everything, and it’s like, oh

God, it’s too much.” This was similar to his comments on numerical representations,

where in 7 cases they made positive impact on his conviction (e.g. he commented that “if

it’s with two different numbers, just like on how you did, then it’s easier to figure out”),

and in 1 cases they made negative impact (where he expressed a dislike towards the use

of decimals). Nevertheless, numerical and visual representations, if not complicated, were

preferred types to Blake. He expressed this opinion explicitly when he was criticizing E3

(perceptual), suggesting that “it doesn’t give you any numbers, doesn’t give you any
200
visuals” and hence it was not convincing to him. His attitude toward narrative arguments

was compatible. If the description used easier language and was not long, he considered it

helpful explanations. On the other hand, he suggested the more complex description

looked like word problems, which he disliked. Last and perhaps the most evident was

Blake’s negative attitude toward symbolic representations, which was detected across the

problems (8 times in total). He called symbolic expressions to be too “math-like” and

appropriate only for “high school or college” students, labeling them as confusing. He

claimed that “it ain’t gonna really work if it’s just with variables.” Blake didn’t exhibit

understanding of the meaning of symbolic arguments. The appearance of these arguments

seemed to keep him from even trying to understand their content.

Blake demonstrated a strong personal standard of what a convincing argument

meant to him. Among the 43 comments classified as “P,” personal standards, 29 were

about how the simplicity (or complexity) made an argument convincing (or not

convincing) to him (e.g. “it’s doing it a lot more simple” and “they’re making it too

complicated”), and 12 were about his need to figure out the problem by himself instead of

being told what to do (e.g. “it’s trying to tell you how to think… and that wouldn’t really

work” and “I’ve known some people where they actually want to think on how to

actually figure it out, not just like, OK, here’s ya how to do it, think this way”). When

making his evaluations, Blake often imagined the scenarios in which he was being taught

the arguments in a mathematics class and expressed his feelings in such situations. He

claimed that some arguments were trying to “trick ya,” “confuse ya,” and “trap ya.” This

manifests the type of frustration some students experience when they face mathematical

problems that may be difficult for them to do. At the same time, it also reveals their needs
201
in these situations. To Blake, whether an argument was convincing didn’t depend on how

complete the argument was, instead, he liked the argument to help him access the

problem so that he could think for himself. Therefore, the argument didn’t need to be

logically correct or even mathematically complete, instead it should explain the problem,

illustrate a few simple examples, or create a context for him to better understand the task

first and then to proceed with solving it. Based on the findings above, Figure 27 was

created to illustrate Blake’s rationale for evaluating mathematical arguments.

Perceptual Examples,
Ritual Imaginaries
Convincing
arguments (Visual, Numerical,
Easy to understand,
Non-procedural Narrative)

Figure 27. Illustration of Blake’s rationale for evaluating mathematical arguments

Figure 27 helps explain the rankings provided by Blake in Table 29. B2

(perceptual) and C4 (perceptual) created scenarios that he could relate to the problem so

that he could think for himself. D3 (perceptual) used easy language to offer an

explanation of the phenomenon described by the conjecture. A1 (inductive) provided a

few examples of the interested numbers. E2 provided a picture to show the objects

studied in the problem. All of these arguments offered him a starting point for working on

the problem even though they didn’t provide details about what exact steps he should

202
take. Therefore, they were considered the most convincing options. On the contrary, D4

(visual) utilized the coordinate plane; E4 (algebraic) involved ritual operation of

symbolic equations; A4 (visual) provided a “blurry” picture; while B4 (visual) and C2

(algebraic) adopted symbolic language to explain integrated geometric structures. All of

these arguments required substantial background knowledge to understand, which

confused Blake. Therefore, they were considered the least convincing options.

The case of Brenda

Brenda was an 8th grade student enrolled in an Algebra I class at the time of data

collection. When working on the SMR, she had selected A4 (visual), B1 (inductive), C3

(visual) and D1 (inductive) in respective problems as the most appealing option and

hence she was considered to be a representative from the inconsistent group.

Brenda’s interview responses are summarized in Table 32 and Table 33. Table 32

illustrates the rankings provided by her for each problem. Column One of the table

represents the order of problems that she tackled. Table 33 summarizes her comments

when articulating why she found certain arguments convincing or not convincing (The

coding of each comment is explained in Table 9). These two tables served as the major

resource for the interview analysis.

203
Most convincing -------------------------> Least convincing
Problem B B2 (perceptual) B1 (inductive) B4 (visual) B3 (algebraic)
Problem D D1 (inductive) D2 (algebraic) D3 (perceptual) D4 (visual)
Problem C C1 (inductive) C3 (visual) C2 (algebraic) C4 (perceptual)
Problem A A4 (visual) A1 (inductive) A3 (perceptual) A2 (algebraic)
Problem E E2 (visual) E3 (perceptual) E1 (inductive) E4 (algebraic)

Table 32. Rankings of arguments provided by Brenda

204
Positive Comments Negative Comments
Problem B
I can imagine a football field, so like, it’s If I did it that way, it would take longer to
easier just to think that way. (E3, L2) figure out how to do it, other than just like,
thinking about how to do it, so you would
actually have to measure it to realize how
farther it is. (E2, P)
I know that it’s longer. (E6) I don’t understand them as well, that’s
why I don’t even like them. (NA)
I don’t like the Pythagorean theorem. (E4-,
R4-, P)
When they add circles to the thing, it kinda
confuses me, so I just don’t like ’em. (E2-,
R1-)
Problem D
It’s easier for me if it has a number in it, to It’s a little bit harder to figure out with the
be able to know, like, it’s easier just to x in it, so you have to figure out on both
figure out how to do it that way. (E2, R3) sides of the equals sign. (R4-)
It’s just the one side and you get the There’s two sides of the equals sign in this
answer. (L5) one, so you would have to figure out both
sides, and then you could get the answer.
(L5-, P)
It’s already on the one side, so you just This one I don’t think has enough
figure out the one side and get the answer, information for me to understand it really
and it’s a lot faster and easier. (L5, P) as well. (NA)
Because he also said, like, it’s “such as I don’t really like graphs so I don’t
200,” you didn’t say that you didn’t try understand… I just don’t get ’em that well.
300, so he could’ve tried it but just didn’t (R1-)
say that he did and it could’ve still worked.
(E2, L3)
Problem C
If you take any kind of triangle and you try I don’t understand it as much. (NA)
and fit it into, like, the first one, it’s always
gonna be smaller than the… the first
triangle’s always gonna be bigger than the
second triangle because of how the sides
are. (E2, R1, L4)
continued

Table 33. Summary of comments made by Brenda

205
Table 33 continued
Positive Comments Negative Comments
I understand what they’re saying by It says A is greater than a ... I just got
cutting the lines and making it into a really confused about that part. (R4-)
shorter one, and I can tell by that that there
is, that it is smaller than that. (E2, R1, L4)
With the first one [argument], it shows a The last one [C4] I don’t think gave
lot more, ’cuz there’s different triangles enough information for me to understand
there that are, that have different sizes. what it meant by it. (L2-)
(E2, L3)
I can tell that it’s right because I know that
it’s bh divided by two, because we already
know that that’s how you find the area and
stuff, and then with, it would be lowercase
with the Triangle 2 and it would be
uppercase with the Triangle 1, so it’d have
to be bigger. (E4, R1)
Problem A
That one has a visual effect with it, so it That one kinda confused me on it, because
makes more sense that it you split up the 6, I didn’t know what was going on in the
it comes into threes, and I can understand problem. (NA)
that way. (E2, R1)
That would probably be another way I I don’t like to think of it that way, I just
would do it, so I would understand it that don’t get it that way, so I don’t think of it
way, ’cuz you can divide any of those by that way. (P)
3, and get a multiple of 3 that way. (E2,
R3)
I like visual things better than just thinking I don’t get how you do the 6n equals 3
in my head about it, so that one makes times 2n. (R4-)
more sense to me. (E2, R1)
Comparing Problems A-D
I understand sales tax more than most The way that they did it, like they added
things, I get that better than the other the circle to it, and it didn’t make as much
things. (E2, R3) sense. (E2, R1-)
It, like, is straightforward, and it tells me
what it is and stuff, it’s a lot easier to
understand. (P)
Problem E
It gives me a visual effect of how it It just tells you how it wouldn’t change.
wouldn’t change, because it would still be (E6-)
double the amount of it, which wouldn’t
do anything to it. (E2, R1, P)
continued

206
Table 33 continued
Positive Comments Negative Comments
It made sense ’cuz they explained what It doesn’t give a picture, it’s just an
happened with the orange and with the explanation. (E2, R1, R2-)
white, and that it would stay the same no
matter what, ’cuz the ratio would never
change of how much would be in there.
(E4, R2)
It gives an actual effect of how it wouldn’t I wouldn’t go that way with that, it’s just
change. (E2, R1) not how I would do that. (P)
It has a picture of how it wouldn’t change I understand it now but I don’t really like
and it gives an explanation with it. (E2, it. (P)
R1, R2)
Additional Comments
That one [E3] did because it said that it That one [C4] I don’t think gave as much
was exactly the same by just explaining information as what I needed to figure out
how it is. (R2) that it was. (E3-, L2-)
With the sales tax, I got that better because I don’t understand probability as well, so
I knew that, with that, it’s easier with when they threw in the numbers, I was
numbers. (E2, R3) kind of confused with it. (R3-)
It’s easier to figure out that whatever 6 is,
you can just divide by 3 and it’s an, it’s a
normal number that is 3. (E2, R3)
It [E2] does show you how and it explains
how to. (E2, R1, R2)
They explained that the ratio between the
ping pong balls would still be the same no
matter if it was doubled or whatever
number. (E2, R2)

As shown in Table 32, Brenda didn’t generally consider algebraic arguments

convincing. She rated them the least convincing in Problems A, B, and E, second least

convincing in Problem C, and second convincing in Problem D. Inductive arguments

were considered either the most or second most convincing to her, with the exception of

Problem E, where it was considered the second least convincing. Her evaluation of visual

and perceptual arguments was quite inconsistent across the problems. They appeared at
207
every place (most --> least convincing) in her rankings. In order to better understand how

Brenda evaluated the proposed arguments and her rationale when providing these

rankings, the coding for her explanations in Table 33 was summarized in Table 34 so to

identify factors and features of the arguments that had influenced her judgment.

Total number of references to representation: 29


Visual Narrative Numerical Symbolic
Positive 8 7 5 0
Negative 3 1 1 4

Total number of references to evidence: 27


Authority Example Imaginary Fact Assumption Opinion
Positive 0 19 1 2 0 1
Negative 0 1 1 1 0 1

Total number of references to link: 10


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 1 2 2 2 0
Negative 0 2 0 0 1 0

Table 34. Categories of comments made by Brenda

As shown in Table 34, the total number of Brenda’s comments that focused on the

representation, evidence and link of the arguments were 29, 27, and 10, respectively.

Visual representation seemed to contribute to her conviction. This view was

conveyed 8 times. For example, she considered A4 (visual) convincing, suggesting it had

208
“a visual effect with it, so it makes more sense.” However, some visual illustration didn’t

make the argument convincing to her. She claimed to be confused by the geometric shape

in B4 (visual) and the graph in D4 (visual). Narrative description and numerical

representation could contribute to her conviction. She suggested that an explanation of

the meaning of pictures or graphs could make an argument more convincing to her. She

also believed that an argument was easier to understand “with numbers.” However, she

found numerical and narrative representations confusing to her in some cases. She

claimed that narrative arguments, with the absence of numbers or pictures, might not

offer enough information. She also found dealing with numbers in certain context (e.g.

probability) confusing. This explained her inconsistent judgment of the perceptual, visual,

and inductive arguments across the problems. Overall, it was found that visual, numerical

and narrative represented arguments could all contribute to her conviction if they were

understandable to her.

The symbolic representation had a more consistent negative influence on Brenda’s

conviction. In all 4 instances where she mentioned symbolic format, she characterized it

as confusing and not understandable. For example, she claimed not to understand why

“6n equals 3 times 2n.” Considering that this equation involves only the simplest

symbolic expressions, we believe she hadn’t yet developed adequate facility with algebra

to use it in problem solving. Therefore, it was not surprising that all the algebraic

arguments were considered as not convincing.

Brenda also paid attention to the evidence presented in the arguments and this was

evidenced 27 times during the interview. She often found arguments that relied on

checking a few cases (e.g. numbers or shapes) convincing. This was detected 19 times.
209
She suggested that she didn’t like argument using the Pythagoras Theorem although she

had learnt it in class; however she found the triangle area formula convincing. This

suggested that even two seemingly similar types of evidence sources could be assessed

quite differently by her.

It was also found that Brenda could be convinced by a variety of links between

evidence and conclusion. She found trying a few cases in Problem D adequate in showing

the conjecture was always true. She found the transformation model used in C3

convincing. She also found the perceptual connection between the football field scenario

and the property of rectangles in B2 convincing. However, she wasn’t able to make the

perceptual connection between the triangle made by wires and its geometric properties.

She didn’t explicitly comment on the link of any argument and it was not detected in the

interview that she was more likely to be convinced by any other type of link. Therefore, it

was believed that she wasn’t yet able to reflect on the logic of mathematical arguments.

Lastly, Brenda’s personal standards were studied. First, she found simple

arguments more convincing. The preference for “easier” arguments was mentioned 7

times during the interview. Second, it was found her appreciation of certain

concepts/topics had also impacted her evaluations. For example, she claimed to not “like

the Pythagorean theorem” and hence considered B3 (algebraic) not convincing. She

didn’t consider E3 convincing since she “wouldn’t go that way with that, it’s just not how

I would do that.” Based on the findings above, Figure 28 was created to illustrate Blake’s

rationale for evaluating mathematical arguments. Such personal preference was highly

context based and could explain her inconsistent view of the same type of argument

across the problems.


210
Ritual, Inductive
Perceptual
Transformational Examples,
Convincing
arguments (Visual, Numerical,
Easy to understand, Narrative)
Simple procedure

Figure 28. Illustration of Brenda’s rationale for evaluating mathematical arguments

Discussion

The investigation of each individual subject’s interview responses offered insights

into their own rationale for evaluating the mathematical arguments. The following

discussion focuses on the thinking pattern exhibited by the subjects as a whole group

during the interviews. In particular, we examined if any argument was considered

significantly more convincing than others in each problem context. We then studied if

any factor had a larger impact on the subjects’ judgment. In addition, we studied the

similarities and differenced among the individual subject. Lastly, we studied the context’s

potential impact on the subjects’ decision.

Most convincing arguments to the subjects

We first examined if students had found any argument significantly more

convincing that others in each problem context. In order to do so, we first assigned values

to each argument based on the rankings provided by the subjects. In particular, if an

argument was ranked as the most, second most, second least and least convincing
211
argument, it received a score of 1, 2, 3 and 4, respectively. Therefore, the rating

represented the position of the argument in the ranking. The lower the rating, the more

convincing an argument was perceived. We then calculated the average rating provided

by all subjects for each argument. Table 35 illustrates the result.

Allen Abby Alice Amy Beth Betty Blake Brenda Average


A1 4 3 2 4 4 4 1 2 3
A2 2 2 1 1 2 1 2 4 1.875
A3 3 4 4 3 1 2 3 3 2.875
A4 1 1 3 2 3 3 4 1 2.25
B1 4 2 2 1 4 3 2 2 2.5
B2 3 1 4 4 1 4 1 1 2.375
B3 1 4 3 2 3 1 3 4 2.625
B4 2 3 1 3 2 2 4 3 2.5
C1 4 1 3 2 3 3 2 1 2.375
C2 1 2 2 4 2 1 4 3 2.375
C3 3 4 1 1 4 2 3 2 2.5
C4 2 3 4 3 1 4 1 4 2.75
D1 2 1 3 4 2 1 2 1 2
D2 3 3 4 1 3 2 3 2 2.625
D3 4 2 1 3 4 4 1 3 2.75
D4 1 4 2 2 1 3 4 4 2.625
E1 3 1 1 4 1 1 2 3 2
E2 1 4 3 3 3 3 1 1 2.375
E3 4 3 2 2 4 4 3 2 3
E4 2 2 4 1 2 2 4 4 2.625

Table 35. Summary of the subjects’ argument rankings

According to the average rating, A2 (algebraic) and A1 (visual) were considered

as the most and least convincing arguments in Problem A, respectively. B2 (perceptual)

and B3 (algebraic) were rated as the most and least convincing arguments in Problem B,

respectively. C1 and C2 (inductive and algebraic, tie) were the most convincing

212
arguments in Problem C where C4 (perceptual) was the least convincing. In Problem D,

D2 and D4 (algebraic and visual, tie) were the least convincing arguments, while D1

(inductive) was considered the most convincing option. Lastly, in Problem E, E1

(inductive) and E4 (algebraic) were considered the most and least convincing arguments,

respectively. These results suggested that the subjects’ evaluation of the same type of

arguments were highly inconsistent across the problems. The same type of argument

could be considered as the most convincing option in one problem but the least

convincing one in another (e.g. A2 (algebraic) was rated most convincing in Problem A

but B3 (algebraic) was rated the least convincing in Problem B). Therefore, it was

difficult to tell whether there was any particular type of arguments that the subjects found

more convincing than others. This finding was compatible with what was detected in the

survey analysis.

We further tested the differences of ratings among the arguments in each problem.

Adopting the within-subject ANOVA (using the arguments in each problem as the levels),

it was found that no argument was considered significantly (p < .05) more convincing

than any other option in every problem (see Appendix C, Table 46). Therefore, based on

the subjects’ rankings, there wasn’t any single argument that stood out in any of the

problems as the most convincing option. This result again demonstrated diversity in the

subjects’ assessment of the arguments.

213
Factors that impacted the subjects’ decision

Analysis of the subjects’ rankings didn’t provide conclusion about what

arguments they considered more convincing. Therefore, by merely looking at their

choices, it was difficult to identify what factors might have impacted the subjects’

decision. Therefore, an analysis of the subjects’ explanations when justifying their

rankings became crucial to this investigation. To summarize the characteristics of the

subjects’ explanations for their rankings , we calculated the total number of comments

about each type of representation, evidence and link of arguments. We also calculated the

percentage of each number in its own category. For example, there were 60 comments

that mentioned visual representation positively contributed to their conviction of an

argument. These comments were 31% of all the comments that referenced to the

representation of arguments (a total of 194). Table 36 illustrates the results.

214
Total number of references to representation: 194
Visual Narrative Numerical Symbolic
Positive 60 (31%) 18 (31%) 33 (17%) 37 (19%)
Negative 20 (10%) 9 (10%) 3 (2%) 14 (7%)

Total number of references to evidence: 272


Authority Example Imaginary Fact Assumption Opinion
Positive 3 (1%) 140 (51%) 34 (13%) 42 (15%) 0 3 (1%)
Negative 0 17 (6%) 9 (3%) 2 (1%) 1 (0.4%) 20 (7%)

Total number of references to link: 81


Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 14 (17%) 14 (17%) 13 (16%) 7 (9%) 5 (6%)
Negative 0 7 (9%) 19 (23%) 1 (1%) 1 (1%) 0

Table 36. Categories of comments made by all subjects

As shown in Table 36, the total number of comments that focused on the

representation, evidence and link of the arguments were 194, 272, and 81, respectively.

The data suggested that opinion (i.e. personal conviction without an explicit

reason) was not considered a reliable source of evidence by the subjects. Although there

were 3 instances when subjects’ decision was made upon a personal conviction, it was

much more frequent when an opinion was indicated unreliable (20 times in total). This

suggested that most subjects were aware of the need to provide evidence other than

personal opinion to support a mathematical argument. When examining the impact of the

types of evidences on the subjects’ decision, it was found that examples (i.e. results from

215
an immediate test) were used most often to support an argument (140 times in total, more

than half of all evidence referenced). At the same time, it was also the second most

criticized source of evidence (only second to “opinion”). Criticism of the use of only

examples focused mostly on their logical limitation, i.e. their inability to show the

conjecture was always true, which was acknowledged by some subjects. The presence of

facts (i.e. known mathematical results) were the second most referenced types of

evidence (a total of 44: 42 positive and 2 negative). However they were mentioned much

less frequently than examples. Note that it was rare (only 2 instances) that a mathematical

fact was considered unreliable source of evidence. This suggested that once the subjects

recognized a known mathematical result, they were likely to consider it reliable.

Furthermore, imaginaries created upon past experience were referenced as reliable source

of evidence for 34 times. This number was close to that of “facts.” However there were

also 9 times when imaginaries were indicated to contribute negatively to the subjects’

conviction about an argument, suggesting that it was not considered reliable source of

evidence in some subjects’ view (e.g. Amy and Betty). Lastly, it was uncommon that a

subject referenced authority (3 instances, all positive) or assumption (1 instance, negative)

as evidence of arguments.

The most influential type of representation was visual, which was considered to

have positively contributed to the subjects’ conviction in 60 occasions. However, it was

also criticized 20 times. Therefore, although visual illustration often positively

contributed to the subjects’ conviction, it might also be considered misleading (e.g. for

Amy), confusing (e.g. for Blake), or not explanatory (e.g. for Betty). In addition,

numerical and symbolic representations were each considered to positively contributed to


216
the subjects conviction in 33 and 37 instances, respectively. However, symbolic

representations were criticized more frequently than numerical representation (14 times

vs. 3 times). This suggested that ideas expressed symbolically were not found as

convincing by some subjects (e.g. Blake and Brenda). Narrative arguments were

referenced 27 times (positive for 18 times, and negative for 9 times). The narratives had

contributed to the subjects’ conviction especially when they were used to explain a

picture or an equation (e.g. Betty). However, some subjects found narrative expressions

not clear or convincing when they were not supplemented by visual, numerical, or

symbolic expressed evidence. In sum, the visual, algebraic, and narrative representation

contributed either positively or negatively to the subjects’ conviction, while ideas

represented in numerical format seemed to have had a positive impact.

When commenting on the link between evidence and conclusion of an argument,

the subjects referenced induction most frequently (33 times in total). Although in 14

instances the subjects expressed that illustration of examples contributed positively to

their conviction, in 19 cases suggestions were made that showing a few examples

couldn’t show the conjecture was always true. This result indicated that the use of

induction was still popular, however some students had developed an awareness of its

limitation in logic. Perceptual connection and transformation contributed to the subjects’

conviction in as many case as induction did (14 and 13 times, respectively). However,

there was only one case where transformation was considered as not convincing while

perceptual connection was criticized 7 times. Perceptual connection is usually utilized to

connect personal experience to a mathematical problem, while transformation involves

more analysis of certain examples and pattern seeking. The later was uniformly
217
recognized as convincing link of evidence and conclusion in an argument. Furthermore,

ritual operation and deductive reasoning (mostly referenced by Amy) was considered as

reliable link when they were mentioned.

Figure 29 was generated based on the numbers in Table 36. A larger font denotes

that the item was more frequently referenced by the subjects. As illustrated, when

evaluating the arguments, the subjects paid the most attention to the evidence. Among all

types of evidence, examples were referenced the most frequently, followed by

imaginaries and mathematical facts. The representation of arguments had also impacted

the subjects’ judgment. Among all types of representations, visual illustration received

the most attention, however it was criticized by some students as well. Similar situation

applied to the view of algebraic representation, where between subject differences were

observed. The link between evidence and conclusion was the least concerned aspect

among the three. Induction was referenced most frequently but it might contribute either

positively or negatively to the subjects’ conviction depending on the context and

individual. Transformation, although not referenced for as many times, seemed to be

uniformly recognized as reliable reasoning mode.

218
Visual Numerical
Perceptual Inductive Ritual

Symbolic Narrative Transformational Deductive

Link
Representation
Imaginaries

Examples
Evidence
Facts Subjects’
conviction

Figure 29. Factors that impacted the subjects’ conviction

Similarities and differences among the subjects

In the previous discussion we revealed some general pattern about the subjects’

rationale in evaluating the arguments. However, it was unclear if such pattern applied to

every individual or only to some of the subjects. More importantly, the individual

differences were repeatedly described using their choices in the survey or the rankings

provided in the interview. It was unclear what factors might have caused the differences

in their ratings. The analysis of each individual subject’s interview responses had

219
provided the bases for the investigation of the similarities and differences of their

rationale. The following discussion offers a cross comparison among the subjects.

Table 37 provides an illustrative blueprint of the subjects’ rationale when

evaluating arguments. This table was generated based on the study of each individual’s

interview responses. As shown in the table, there were similarities as well as differences

among the subjects.

Personal Need Evidence Representation Link


Allen Simple procedure, Examples, Visual, Symbolic Transformational,
Precise description Facts, Perceptual, Ritual
Abby Easy to understand, Examples, Visual, Numerical Perceptual,
Familiar procedure Imaginaries Inductive
Alice Easy to understand, Examples Visual Transformational
Familiar procedure
Amy True for all cases Examples, Symbolic, Deductive,
Facts Numerical Transformational
Beth Easy to understand, Examples, Visual, Narrative Perceptual
Relatable scenario Imaginaries
Betty Detailed procedure Examples, Symbolic, Ritual, Perceptual,
Facts Numerical Transformational
Blake Easy to understand, Examples, Visual, Numerical, Perceptual, Ritual
Non-procedural Imaginaries Narrative
Brenda Easy to understand, Examples Visual, Narrative, Inductive, Ritual,
Simple procedure Numerical Transformational

Table 37. Summary of the subjects’ rationale in argument evaluation

220
View of evidence

The most prominent similarity among the subjects was that they all considered

examples as a reliable source of evidence. Testing a few cases and seeing if the

conjecture was true in specific conditions had contributed to the subjects’ conviction.

This was observed in the comments from all subjects on at least a few (if not all)

arguments.

As a contrast, students’ view of the use of mathematical facts was less consistent.

Allen, Amy and Betty indicated that they were likely to be convinced if an argument was

based on a known mathematical fact. On the contrary, Blake seemed unwilling to use any

established result and preferred exploring the problem by himself. The other four subjects

might acknowledge that some known results (e.g. the triangle are formula) helped

convince them an argument was true; however they might not consider such results as

established mathematical fact but rather as something they had heard about.

In addition, the subjects’ view of imaginaries also differed. To Abby, Beth and

Blake, imaginaries were major source of evidence, while in Amy’s view, people’s brain

can “skew everything” so imaginaries were definitely unreliable. To Allen, it depended

on whether the imaginary was adequately clear to him. Overall, the use of examples

seemed to be uniformly contribute to the subjects’ conviction, while each individual’s

view on the use of other sources, such as known mathematical facts and their own

imaginaries, differed.

View of representation

Visual representation were referenced most frequently in the interviews; however

it was not considered by all subjects as one to have positively contribute to their
221
conviction of an argument. Amy and Betty clearly expressed that they were unlikely to be

convinced by visual arguments. Amy claimed that it was possible that pictures and

figures misrepresented the problem. Betty didn’t tend to perceive connection through

examining the visual demonstrations. She needed to see a narrative explanation of what

the pictures meant when there was a visual illustration. Nevertheless, visual illustrations

still seemed to be the most preferred type of presentation. Six subjects explicitly stated

that visual aids could contribute to their conviction, especially when the image was

simple and understandable to them.

Numerical representation seemed to be favored by the subjects. Although its

function was not as commonly mentioned as visual illustration, we still identified at least

5 subjects who considered the numerically based illustration to positively contribute to

their conviction. To the remaining three subjects, i.e. Allen, Alice and Beth, it didn’t seem

that the use of numerical expressions made an argument less convincing to them. Its

function was just rarely articulated in their explanations. Therefore, the subjects seemed

to share some similarities on their view toward numerical expressions.

Narrative representation was the least frequently referenced type of representation

(positively or negatively) although non-mathematical language were used by every

subject when explaining their understanding of each argument. However, some subjects

demonstrated a higher need for narrative explanation than others. For example, Betty

suggested that visual illustration was not convincing unless it was also accompanied by

an explanation. In contrast, Allen preferred to read equations and examine graphs and

didn’t consider an argument convincing if it was too “wordy” and not “straightforward.”

The major advantage of narrative representation was the easy language, which helped the
222
subjects to understand an argument if adopted properly. However it might be difficult to

use narrative to describe some concepts or examples as precisely as using numeric, visual

or symbolic representations. Consequently, the subjects’ evaluation of narrative

descriptions highly depended on whether they understood the concepts embedded in

narratives without seeing any specific numbers, images or symbols, or whether they

understood the numbers, images, or symbols without a narrative description.

Compared to the other three types of representations, the subjects showed greatest

differences in their views about symbolically expressed arguments. To Amy, symbolic

representation could show the conjecture was true in every possible case. To Allen,

symbolic representation demonstrated the ideas clearly and concisely. To Betty, symbolic

representation helped her see the details of the argument procedure. Therefore, these

three subjects found symbolic representation positively contributed to their conviction.

On the contrary, Blake considered symbolic represented terms as confusing and not

appropriate for his age group. Brenda also found symbolically represented theorems not

appealing. Therefore, most arguments in symbolic representation were considered

unconvincing to them. The symbolic representation didn’t seem to have either positively

or negatively contributed to the other three subjects’ evaluation. They didn’t seem to

recognize the advantage of symbolic representation, nor did they find it confusing. This

finding was not surprising since symbolic expressions were usually more abstract than

ideas represented in the other three forms. Students who understood the ideas of symbolic

expressions might appreciate how clear and concise they were. More mathematically

mature learners might even see the general validity represented by symbolic arguments.

223
However, to those who hadn’t yet adapted to symbolic representations, they only looked

unnatural and difficult and hence were not found convincing.

View of link

The link between evidence and conclusion seemed to be the least influential

aspect on the subjects’ decision. This was natural since the subjects may not start

examining the link if they didn’t find the representation a reliable format or consider the

evidence convincing. Therefore, the link appeared to be the last thing among the three to

be considered.

Only Amy insisted that the evidence used in a convincing argument must show

the conjecture was always true and found symbolic deduction the most reliable way to

guarantee this. This condition wasn’t a requirement for a convincing argument in other

subjects’ view.

Several subjects (Alice, Amy, Beth, and Betty) had articulated that showing a few

examples might help them understand an argument but were not sufficient to convince

them that a conjecture was true. This suggested that some students were aware of

limitation of induction. Although they were not yet able to appreciate deductive reasoning,

they had developed the ability to understand generic examples. For example, Alice could

visualize that some geometric property would remain the same when the shape was

changing in a certain way. Allen could see a formula in a numeric equation since a value

in the equation could be substituted by others without changing the result. Overall, it was

detected that transformational reasoning was widely considered to positively contributed

to the subjects’ conviction. It was observed in 5 students’ explanations (except for Abby,

Beth and Blake).


224
Perceptual connection was also appreciated by many subjects (including Allen,

Abby, Beth, Betty and Blake). Perceptual connection relates a given mathematical

problem to imaginaries created upon previous experience, and in many cases such a

connection was not precisely described but was perceived by the subjects. Only Amy

pointed out such connection might not be a reliable way to build an argument. Other

students might not have been able to perceive some connections between a mathematical

problem and a real life scenario; however they might not have realized that arguing by

making such connection was a unreliable method.

Lastly, it seemed that all the subjects believed ritual operations, numerical or

symbolic, were convincing link between evidence and conclusion of an argument,

although they were not very frequently mentioned.

Personal standards

Personal standards played an important role in the subjects’ decision making

(Recio, & Godino, 2001). Having a personal standard of what a convincing argument

meant caused the distinct evaluations of the arguments.

Amy seemed to be the only person who believed a convincing argument should

be one that proved the conjecture was always true. To the other subjects, this wasn’t a

principle that guided their decision. Instead, to many of them (Abby, Alice, Beth, Blake,

and Brenda), whether an argument was easy to understand determined, largely, its

credibility. These subjects’ standards for easy argument were not mutually exclusive.

Most prominently was that none of the five subjects considered algebraic arguments as

easy to understand. However, they used different standards to determine whether an

argument was easy to understand. Blake found an argument easy to understand if it used
225
easy language, easy examples, and easy visual illustrations. Brenda was able to

appreciate more complex examples and visual illustrations; however she preferred an

argument that didn’t involve a complex procedure (e.g. multiple steps). Beth considered

an argument easy to understand if the argument was built upon a life scenario to which

she could relate. Abby and Alice found an argument easy to understand if the concepts

used in the argument and its reasoning procedure were familiar to them. While Beth

preferred a context rooted in her life experience, Abby and Alice also considered

something familiar to what they learnt in mathematics class as convincing.

Allen and Betty were the only two subjects who didn’t claim that a convincing

argument needed to be easy to understand. Note that Allen did prefer arguments that

involved simple procedures. This was close to Brenda’s opinion. However, Allen’s

preference toward simple procedures was not because they were easier to understand. He

claimed that he didn’t have much difficulty understanding all the arguments used in the

interview. Despite this, he still preferred “straightforward” arguments since they were

concise and delivered clearer opinion.

Similar to Allen, Betty also demonstrated an understanding of a wide range of

arguments. However, different from Allen, Betty paid more attention to the details of

arguments. She found an argument that left too much space for the readers to decipher

not convincing. For example, she didn’t consider visual illustrations alone were

convincing. She believed they must be accompanied with descriptions that explained the

information embedded in the images and graphs. Consequently, Betty considered

arguments that clearly described the reasoning procedure convincing.

226
Summary

As shown in the discussion above, there were similarities as well as differences

among the subjects’ rationale in argument evaluation (see Table 38). In general, the

subjects found examples convincing in most cases. However their view towards the use

of existing mathematical results and their own imaginaries differed. In addition, the

subjects found numerical and narrative arguments easier to understand than symbolic

ones. Visual argument could be helpful or confusing depending on the actual images or

diagrams provided. Most students didn’t realize that symbolic representation had the

potential to prove the general validity of a conjecture. However, some students found

symbolic expressions concise and clear while other viewed them as confusing. With the

exception of one participant, the subjects were not aware that the link between evidence

and conclusion must show the argument was always true. Transformational and

perceptual reasoning was widely adopted. However, the subjects’ view toward induction

differed. Half of the subjects seemed to have realized its limitation. Lastly, the subjects

demonstrated various personal standards of what a convincing argument meant to them.

Five subjects found easier-to-understand arguments more convincing; however they also

used different standards to judge the “easiness.” For example some subjects found

arguments embedded in a familiar context easier to understand and hence perceived them

as more convincing. Three didn’t take “ easiness” into consideration but paid attention to

different aspects (logic, expressions, and reasoning procedure) of the arguments. These

differences among the individuals’ rationale caused the distinct evaluations of the

arguments as shown in their rankings.

227
Similarities Differences
Evidence  Examples were convincing  Imaginaries and known
source of evidence. mathematical results might or
 Authority, assumption and might not be viewed as reliable
personal opinion were rarely source of evidence.
considered convincing.
Representation  Numerical and narrative  Visual illustration could be
arguments were usually easier sufficient or not sufficient to
to understand. demonstrate the validity of a
 Seeing a few numbers in an conjecture.
argument was helpful in most  Narrative descriptions could be
cases. necessary or unnecessary.
 Visual illustration was helpful  Symbolic expression could be
if the provided image was concise and clear or confusing
understandable. and meaningless.
 Most subjects were not aware
of the logical advantage of
symbolic representation.
Link  Deduction was rarely used or  Induction could be viewed as
considered necessary. convincing, convincing in some
 Transformation and perceptual situations, or unconvincing.
connection was widely
adopted.
 Ritual operation was rarely
considered unconvincing.
Continued

Table 38. Similarities and differences in the subjects’ rationale of argument evaluation

228
Table 38 continued
Personal  Most subjects didn’t focus on  Whether an argument was easy
standards whether an argument could to understand was taken into
prove the conjecture was consideration by some but not
always true. all the subjects.
 Some subjects found arguments
embedded in a familiar context
or use familiar reasoning
techniques more convincing.
 The subjects had different
demand for the clarity of
arguments.
 Other various personal
opinions.

The impact of context on the subjects’ judgment

Although each of the subjects had exhibited some general standards in assessment

of the arguments, he/she still provided different evaluations for the same type of

arguments in different contexts. None of the participants chose the same type of

arguments as the most convincing option in more than 3 problems. This section discusses

the possible causes of this phenomenon.

Differences in complexity

The subjects’ responses revealed that whether an argument was easy to understand

had a substantial impact on their conviction. This might explain the difference in their

judgment of the same types of arguments. For visual arguments, E2 was probably the

229
easiest to understand since it was a direct representation of the problem content. A4 and

C3 might be considered more difficult since they required an understanding of the

transformation of the shapes. B4 was even more complex since it involved multiple

geometric components and the reasoning depended on an analysis of their spatial

relationship. D4 was also complex since it required a conceptual understanding of the

coordinate plane to be fully perceived.

For algebraic arguments, A2 might involve the simplest equation which had only

one variable and a one-step operation (i.e. multiplication). D2 also contained only one

variable but involved multiple steps of operations. B3, C2 and E4 all contained two or

more variables and involved multiple steps of reasoning. For inductive and perceptual

argument, the provided examples or evoked imaginary could also be easy or hard to

perceive by the subjects.

Five subjects had clearly pointed out that whether an argument was easy for them

to understand had impacted their evaluation of the argument and they might be confused

by complex images, equations and other component of an argument. Therefore, the

complexity of the same type of arguments was a cause for their different ratings.

Differences in familiarity

The same type of arguments might be evaluated differently because of students’

familiarity with their content. Such a difference was evident in the subjects’ judgment of

the evidence of arguments. For example, Brenda considered B3 (algebraic) not

convincing since she was not familiar with the Pythagoras Theorem. However, she found

C2 (algebraic) more convincing since she was familiar with the triangle area formula.

Similarly, Allen claimed that the imaginary of a triangle made of wire was more clear to
230
him than an imaginary of a football field. Hence he gave C4 (perceptual) a higher ranking

than B2 (perceptual). Abby’s view was just the opposite to Allen’s. She considered B2

convincing since she knew what a football field looked like. She found C4 not

convincing since she “never heard of using wire to make a triangle.” Since it was possible

for any argument, regardless of its type, to provide a context that was familiar or

unfamiliar to students, students evaluation of it could be quite different depending on

their previous classroom and life experiences.

Differences in clarity

Even the same type of arguments could be different in its perceived level of

clarity of the concepts and the reasoning procedure. A typical example was about the

inductive arguments (A1, B1, C1, D1 and E1). B1 was the least clear one since it just

stated that a few examples were tested but didn’t give any information about what

examples were used and what results were observed. A1 and C1 were more clear since

they provided the examples. D1 and E1 were the clearest since they didn’t only provide

examples, but also showed how an example was tested and demonstrated the operations.

These differences impacted Betty and Allen’s judgment as they considered A1, B1 and

C1 as the least convincing but ranked others two higher. Clarity of arguments also had

impacted Blake’s decision. He tended to prefer arguments that didn’t show the specific

steps and left some space for his own thinking.

Differences in function

Even though A2, B3, C2, D2 and E4 were all classified as algebraic, the function

of the symbolic expressions were different in each argument. Indeed all symbols

represented variables. However, in B3 and C2, the symbolic form was the carrier of
231
known mathematical results (i.e. the Pythagoras Theorem and triangle area formula),

which were not present in A2, D2 and E4. Additionally, in D2 and E4 the symbolic

representation provided the environment for algebraic operations, however ritual

operation wasn’t emphasized in A2, B3 and C2. Furthermore, inequalities were

considered in B3 and C2, which required a conceptualization of the variables as an

ordered collection of values. However, inequality was not a focus in A2, D2 and E4.

Different functions of visual illustrations were also observed in the visual

arguments. In A4, D4 and E2, the figures were used to represent the problem content in a

visual format so that the subjects could perceive the relationship within the diagrams and

then transform that understanding into the actual context of each problem. However, in

the geometry problems, the figures themselves were subjects of the study and the

participants didn’t need to relate them to anything else. Additionally, subjects needed to

concern about the quantities of the objects used in the diagrams in A4 and E2; however,

spatial relationship, distance and sizes were the focus of the figures used in B4, C3, and

D4. Therefore, visual illustration was a category that involved highly diverse internal

properties.

The situation was similar to the perceptual arguments. Whereas A3, B2, and C4

tended to prove their respective conjectures by connecting the content to imaginary

contexts that were familiar to students, additional contexts were not provided in D3 and

E3. Instead, D3 and E3 tended to use a narrative description to help students perceive the

connection of concepts within their own contexts.

The different functions of seemly similar components or features of arguments

had impacted students’ decision. For example, students might prefer B3 (algebraic) and
232
C2 (algebraic) since they saw the familiar mathematical results as reliable evidence stated

in the two arguments. However they might have find the other algebraic arguments less

convincing since those familiar results were not present (e.g. Allen). Students may have

recognized perceptual arguments that referenced a familiar scenario as convincing but

considered those that tended to make connection within the context as less convincing

(e.g. Beth). Students might have acknowledged illustrations of images convincing in the

geometry problems but consider visual aid in other problems less reliable (e.g. Amy and

Betty).

Lastly, students viewed the functions of examples differently in the inductive

arguments. For instance, Amy seemed to view examples provided in the geometry

context (i.e. different shapes) as cases that were connected to each other while she

considered different numbers as separated and unrelated cases. Therefore, when she

evaluated inductive arguments that used different numbers as examples, she considered

them unconvincing. She was however, convinced by an inductive argument in the

geometry context. Whether the illustrated examples were viewed as generic examples or

isolated cases also impacted Alice’s and Beth’s conviction. Alice considered D1

(inductive) not convincing since “they only gave two numbers for you to work with” and

“what if the price is higher than 500.” However, she considered A1 (inductive)

convincing since every multiple of 6 that she tried was a multiple of 3 as well. Beth’s

view was just the opposite. She suggested that A1 (inductive) was not convincing since

“just ’cuz she’s tried a lot of them, she hasn’t tried all of them.” However in evaluating

D1 (inductive), she claimed that “I could insert the 200 dollars and the 500 dollars that

233
he’s suggesting is the same thing, and I could see if it was actually right,” suggesting she

had seen properties in the given example which could transfer to other cases.

Summary

In sum, the subjects’ interview responses revealed that the complexity of the

arguments, students’ familiarity with the context used in the arguments, the clarity of the

explanation presented seemed to have impacted the subjects’ evaluation and judgment.

An argument could be difficult or easy, could use familiar or unfamiliar illustration, could

seem clear or unclear, regardless of its type. In addition, students’ perception of the

function of the same component of an argument also varied. They viewed the examples

used in one problem as isolated cases while considered the examples in another problem

as related instances. Therefore, these factors, which didn’t depend on the argument type

but aligned with the subjects’ personal standards for judgment, could have caused

students’ inconsistent evaluations across the problem contexts for arguments that were

categorized as the same type (see Figure 30).

Complexity
Functional
Inconsistent differences of like
Familiarity
evaluations components or
features
Clarity

Figure 30. Factors that caused inconsistent evaluation of the same type of arguments

234
Reflecting on survey results based on findings from the interviews

The argument rankings provided by participants in the interview and the

explanations of their rationale in making such evaluations helped us understand the

causes of the similarity and differences of individual learners’ survey responses.

The distinct rankings provided by the interview participants (see Table 35)

demonstrated that the highly diverse preference among students towards each argument,

as observed from the survey results, was unlikely to be a result of random selection. A

comparison between the interview participants’ ranking and their survey responses (in

judging whether an argument was convincing) revealed consistency for 6 of the 8

participants (except for Alice, who demonstrated a preference toward perceptual

arguments but considered them not convincing in the interview, and Beth, who showed a

preference toward algebraic arguments in the interview and such a tendency was not

observed in her survey responses). As illustrated by the explanation provided in

participants’ interview responses, each learner had his/her own rationales for his/her

decisions and the choices were unlikely to be arbitrarily made in the survey.

Findings from the interviews offered plausible explanations for our previous

conjectures about why certain options received higher ratings in the survey. It was

hypothesized that the participants were more likely to understand an argument when it

showed more details about concrete examples or provided visual support. This conjecture

was supported by the interview results, where the use of example was perceived as the

most frequently referenced evidence to build a reliable argument. Visual illustration was

the preferred representation by 6 of the 8 interviewees. We had also suspected that the

participants were less likely to be completely convinced by checking and verifying a few
235
cases. This conjecture was also supported by some subjects’ interview responses, where

four subjects articulated that showing a few examples was not sufficient to convince them

that a conjecture was always true. In addition, we had speculated that arguments that had

used easier language and offered shorter descriptions were more likely to be preferred by

students. This conjecture was demonstrated by the interview responses of 5 participants,

who considered an easier-to-understand argument convincing. However, the other 3

interviewees preferred the use of more abstract expressions and detailed explanations.

Therefore, students’ preference for the expression of arguments were different among

individuals.

The survey results indicated that students were not consistent in their evaluation

of the same type of argument across the contexts. This phenomenon was also observed in

the interview phase. An examination of the participants’ explanation revealed that the

complexity of the expression and concepts used in the arguments, students’ familiarity

with the context, the clarity of the explanation, and personal perception of specific

elements used in the arguments seemed to have impacted the subjects’ evaluation, which

caused inconsistent ratings across the contexts.

It was also observed that survey responses from students enrolled in higher

performing schools (as assessed by state standardized tests) were not significantly

different from those who enrolled in lower performing schools. This result suggested that

the knowledge and skills that could help students achieve higher scores in standardized

tests may not directly associate with greater maturity in mathematical reasoning. Related

results were also observed in the interview phase. Two participants of the interviews

(Allen and Betty) were enrolled in Honors Algebra I classes and they did demonstrated a
236
familiarity to formulas and fluency of symbolic operations. However, neither one of them

was aware that checking a few examples was not sufficient to prove a conjecture was

always true. They didn’t realize that algebraic expressions could be used to prove the

general validity of a conjecture either. Their personal preferences that didn’t focus on the

rigor of logic of arguments had also impacted their judgment of whether a conjecture was

convincing. Therefore, they apparently didn’t fully understand the purpose of the use of

algebra in mathematics despite of their greater familiarity with symbolic skills. It was

premature to assume Allen and Betty representative of higher achievers in standardized

tests. However, their explanation offered an understanding of the analysis concerning

between school comparison.

237
CHAPTER 5. CONCLUSION

This chapter is dedicated to a discussion of the key findings of the study. First, an

overview of the study is provided. In addition, findings to in respond to the proposed

research questions are summarized. Furthermore, the study’s contribution to the literature

is synthesized. Lastly, an implication for practice and further studies is discussed.

Overview of the Study

The study examined how 8th grade students evaluate arguments in a wide range of

mathematical contexts. The analysis included investigations on the types of mathematical

arguments that students found convincing, exploratory and appealing, common aspects

and features of arguments that impacted students’ evaluation of the arguments, and

problem contexts’ impact on their judgment.

The study involved two phases, a survey and a follow-up interview. Over five

hundred 8th grade students from five Ohio public schools participated the survey study,

where they were provided a variety of arguments in four different mathematical contexts

and were asked to determine which of these arguments were convincing, explanatory and

appealing to them. Eight subjects, whose survey responses were distinct from each other,

were selected to participate the follow-up interviews, where they were asked to explain

their rationale for determining their evaluation of an argument.

238
Both quantitative and qualitative methods were utilized in data analysis.

Statistical data from the survey was used to identify types of mathematical arguments that

students found convincing, exploratory and appealing. Interview data were coded using a

proof classification framework, i.e. CCIA (see Figure 7), to identify the aspects and

features of arguments that impacted students’ evaluation of the arguments.

Summary of the Findings

The findings from both the survey and interview are summarized to address each

of the three research questions.

Q.1. Are there certain types of mathematical arguments that students found convincing,

exploratory and appealing?

This question was explored using 3 different analytical methods. First, we

examined if any type (i.e. algebraic, inductive, perceptual, and visual) of argument was

considered the most convincing, explanatory or appealing option by analyzing the

cumulative data of all responses obtained to all problems. The survey results suggested

that no certain type of arguments received significantly higher ratings than others (see

Table 13). A certain type of argument might have received higher rating in one problem

but was rated low in another problem, and collectively, when combining the results for all

problems, no categorical type stood out as the most convincing, explanatory or appealing.

This result was compatible with findings from the interviews, where no argument

received significantly higher ranking than others in any problem (see Table 46).

239
Second, we examined if there was any argument type received significantly better

ratings than others in each problem context. As mentioned, the interview results didn’t

reveal significant differences in any of the problems. However, the survey results did

indicate that there were some arguments that were considered more convincing,

explanatory, or appealing in certain problem contexts (see Table 11). In particular, A4

(visual) and B3 (algebraic) were considered as the most convincing, explanatory, and

appealing option in their respective problem contexts (the appealing ratings for B3 and

B2 were not significantly different). However, in Problem C and D, no single argument

received significantly higher ratings than all others. This suggested that the participants’

preference of argument type was more uniform in some contexts than the others, such as

in Problems A and B; however their views were more diverse in other situations.

Nevertheless, even in Problems A and B, the lower rated arguments shouldn’t be ignored.

The most appealing arguments, A4 (visual) and B2 (perceptual), were chosen by students

as the closest way to how they would argue by no more than 40% of the participants (39%

for A4 and 28% for B2), while the least appealing options, A3 (perceptual) and B1

(inductive), were chosen by 17% and 20% of the participants. Although the difference

between the most appealing and least appealing options was significant statistically (p

< .05), this does not mean that more than 1/6 of the participants’ preference could be

ignored. Therefore, although some arguments received higher ratings in their respective

problem contexts, we found the lower rated arguments were still convincing, explanatory

and appealing to a considerable proportion of the participants.

Lastly, we examined if there was certain type of arguments that was preferred by

each individual. The interview results suggested that none of the participants offered the
240
highest ranking for the same type of argument in more than 3 of the 5 problems. The

survey results indicated that only 19 participants (4% of the sample) had chosen the same

type of arguments as the appealing options for all 4 problems, and an additional 122

participants (26%) had chosen the same type of arguments as the appealing options for 3

of the 4 problems. Therefore, most participants (70%) didn’t select the same type of

arguments as the appealing option in more than 2 of the 4 problems. This result suggested

that for most individuals, there wasn’t a single argument type that was preferred across

the problem contexts.

Overall, we found that students’ ratings for the same type of arguments were

highly inconsistent across the contexts and among individuals. Every argument was

considered convincing, explanatory and appealing by a substantial proportion of the

survey participants, while in the interview, no argument was ranked significantly lower (p

< .05) than others in any problem.

Q.2. Are there common aspects and features of arguments that significantly impact

students’ judgment of the arguments? If yes, what are they?

Analysis of the interview data focused on responding to this question. Results

were first reported on the thinking patterns of all subjects as a group. In addition, findings

from each individual interview were compared to identify similarities and differences

among the participants’ responses.

The interview responses revealed that among the three aspects of arguments

identified in CCIA (i.e. evidence, representation, and link between evidence and

conclusion), the evidence was the most frequently referenced by the subjects, followed by
241
representation, and the link was the least concerned aspects when they justify their

rankings of the arguments (see Figure 29).

Among all types of evidence, examples (i.e. results from immediate tests) were

referenced most frequently, followed by imaginaries (i.e. mental image created upon or

recalled from previous experience) and facts (i.e. well known mathematical results).

Among all types of representations, visual illustration received the most attention.

It was at times criticized as being confusing or unreliable. Similar situation aroused when

studying the algebraic representation. Some subjects found it concise and convincing,

while others found it confusing and meaningless.

Among all types of links between the evidence and conclusion, induction was

referenced most frequently. However, it also contributed either positively or negatively to

the subjects’ conviction depending on the context and one’s personal preference.

Transformation, although not referenced often, seemed to be uniformly recognized as a

reliable reasoning mode.

As mentioned above, the subjects’ rationale for their judgment of mathematical

arguments shared some common features but differences among the individuals also

existed. The between-subject similarities and differences were systematically studied (see

Table 38). Cross subject comparisons suggested the following.

First, when choosing the reliable source of evidence, the participants found

examples (i.e. results from immediate tests) convincing in most contexts. However, their

view on the use of existing mathematical results and their own imaginaries differed.

Some participants were more likely to be convinced by well known mathematical results

242
(e.g. theorem or formula). Others tended to rely on their own imaginaries and previous

experience when judging a problem.

Second, when examining representation’s impact on the participants’ judgment,

we found the participants were more likely to understand numerical and narrative

arguments than symbolic ones. The participants often found numerical results convincing

except for very rare occasions (e.g. Brenda had difficulties working with numbers in the

probability problem). In addition, most participants had articulated that visual

illustrations positively contributed to their conviction in some contexts. However, they

claimed that images/diagrams that were more difficult to understand (e.g. B4 and D4)

also confused them and hence were not helpful to their conviction. Furthermore, Amy

claimed that visual illustration could be misleading sometimes. Betty claimed that a

visual illustration by itself was not sufficient to convince her and suggested that it should

be accompanied by explanations using narratives, numbers or symbols. In explaining

their view of symbolic representations, 3 of the 8 participants (Allen, Amy and Betty)

found symbolic expressions concise and clear, and others viewed them confusing and not

helpful for their conviction.

Third, when evaluating the link between evidence and conclusion, except for Amy,

no participant had realized that symbolic representation had the potential to prove the

general validity of a conjecture. Even Allen and Betty, who demonstrated well perception

of the meaning of variables in each symbolic argument, were not aware of algebra’s

logical advantage. In fact, with the exception of Amy, the participants were not aware that

the link between evidence and conclusion must show the argument was always true.

Some participants claimed an argument convincing, but were still not sure if the
243
corresponding conjecture was always true (e.g. Alice and Betty). Therefore, deductive

reasoning was not utilized by most of them (except for Amy in her work on non-

geometric contexts). Instead, transformational and perceptual reasoning was widely

adopted based upon trials, experience and imaginaries. The participants’ view toward

induction differed. Half of them articulated that checking a few cases was not enough to

prove a conjecture was always true. However, for these participants, the realization of the

limitation of induction wasn’t present in their responses to every problem. For example,

Beth claimed that trying a few numbers in Problem A was not sufficient to show the

conjecture was always true; however she found checking a few values in Problem E was

adequate to demonstrate the conjecture was always true. Note that the familiarity with

algebraic techniques didn’t necessarily help the participants realize the limitation of

induction. For example, Allen was very confident working with algebra, however he

claimed that he needed to plug in a few values to make sure a formula was correct.

Lastly, the subjects demonstrated various personal standards of what a convincing

argument meant to them (see Table 37). Only Amy insisted that a convincing argument

must show the conjecture was always true. This criteria wasn’t a requirement for

convincing argument in other participants’ view. Instead, five of them found easier-to-

understand arguments more convincing. Two participants were more likely to be

convinced by familiar scenarios evoked by the arguments. Two participant claimed that a

convincing argument should adopt simple and straightforward reasoning process.

Overall, our data suggested that when evaluating mathematical arguments, the

evidence had the largest impact on the subjects’ judgment, followed by the representation,

and the logical link between evidence and conclusion seemed to have the least impact.
244
However whether a certain type of evidence, representation, and link caused positive or

negative impact on one’s conviction depended on the individual’s preference. In addition,

the subjects also had personal standards to determine if an argument was convincing. The

personal standards were found to be associated with various features of arguments,

including the easiness of expression, clarity of language, and reasoning structure.

Q.3. How does problem context impact students’ judgment of arguments?

Analysis of interview data revealed that the complexity of the expression and

concepts, students’ familiarity with the context used in the arguments, and the clarity of

the explanation presented seemed to have impacted the subjects’ evaluation and judgment

(see Figure 30). An argument, regardless of the type in which it was categorized, could be

perceived as difficult or easy, could use familiar or unfamiliar illustrations, and could

seem to offer adequate or insufficient explanation to students (e.g. Relationships

demonstrated by an image could be easy or difficult to understand). This suggested that

the features that students noticed didn’t align with the factors we used to categorize the

arguments. Because of this misalignment, arguments in the same category were evaluated

differently by the participants.

In addition, although certain components of arguments were identified to have the

same function mathematically, students’ perception of the function of these components

varied across the context. For example, some participants had viewed the examples used

in one problem as isolated cases while in another problem as related instances (e.g. Alice

and Beth), which led to their different judgment of inductive reasoning among the

contexts. In another instance, some participants had acknowledged illustrations of images


245
convincing in the geometry problems since those images were themselves the subject of

study. However they considered visual aid in other problems less reliable as a way to

interpret the content (e.g. Amy). These findings suggested that although some

features/factors of arguments were assumed as similar by mathematical standard, they

might seem difficult to students. Since those features/factors were used to categorize the

arguments, students’ inconsistent evaluations for arguments that were categorized as the

same type seemed to be a natural outcome.

Contribution to the Literature

This study advanced the understanding about proof learning from four aspects:

empirical report on results from a large sample, investigation on student thinking pattern,

proof classification framework development, and task design.

First, the study analyzed survey results from 476 eighth grade students who were

enrolled in five Ohio public schools that had demonstrated different levels of

achievement as measured by state standardized tests. Compared to Healy & Hoyles’s

(2000) study, which focused on the performance of high attending 14-15 year old

students, this study chose a sample that was more likely to represent the general eighth

grade student population. Healy and Hoyles found that students excluded algebraic

arguments when they were asked to select an argument that they found convincing and

explanatory. Our results do contrast findings reported in Healy and Hoyles’s study in that

our subjects didn’t show bias against algebraic arguments when making their choices. In

fact, the algebraic argument in each problem context was considered convincing and

explanatory by at least 3/5 of the participants. Algebraic arguments were not the least
246
appealing options in all but Problem D. In addition, the follow-up interviews revealed

that 3 of the 8 participants exhibited a preference toward the use of symbolic expressions.

Certainly, some participants expressed a negative view of algebraic arguments. However,

we didn’t observe that algebraic arguments were less convincing or preferred when

compared to other types of options. The most evident finding was that students’ preferred

argument type was highly inconsistent across content areas and different among

individuals. Hence, it was difficult to conclude whether a certain type of argument was

more likely to be considered convincing, explanatory and appealing by the students. This

result is compatible with findings of the previous literature that the understanding of

proof develops locally (Freudenthal, 1971; Reid, 2011), and hence an overarching

preference of proof type is unlikely to be achieved at early cognitive stages. The finding

of this study addressed that there didn’t seem to be any single approach that solely

facilitated the local development of the proof understanding. As illustrated by our data, at

least half of the students found two or more argument types convincing and explanatory

in the same context, and even the least appealing option was preferred by at least 1/6 of

the sample in each problem. Overall, this current study provided an analysis of empirical

data obtained from a considerable number of students. It demonstrated that students’

preference of argument types was highly diverse among individuals in each of the studied

contexts.

Second, the study documented detailed explanations of 8 students for their

judgment of arguments in 5 different mathematical contexts. Such an investigation

provided insights into the aspects of arguments that impacted eighth grade students’

evaluations. Similarities among the subjects’ comments highlighted patterns in their


247
thinking when assessing and judging arguments. Most prominently, examples used in an

argument received the greatest attention from the subjects and had a major impact on

their judgment. In addition, our analysis revealed that at least half of the subjects had

realized that a conjecture couldn’t be proved to be always true if only a few examples

were tested. However, most of them were not yet aware of the advantage of symbolic

expressions which could represent general cases. Balacheff’s (1988) description of

pragmatic justification (see Figure 1) and Waring’s (2000) proof levels well explained

this phenomenon. According to Balacheff’s theory, the subjects in this study no longer

relied on naive empiricism. Instead, their conviction depended on crucial experiment and

generic examples. According to Waring’s model, these subjects had reached Level 2,

where they still relied on empirical checking but were more careful in choosing examples

to verify with the potential to notice certain patterns in the process. Such an

argumentation mode between induction and deduction was also extensively documented

in Simon’s (1996) study on transformational reasoning.

The interview responses also revealed that the link between evidence and

conclusion was the issue students seemed least concerned with when they were

evaluating the arguments. This finding was compatible with Yang and Lin’s (2008) RCGP

model (see Figure 6). As suggested by RCGP, students would not start examining the link

if they found the evidence unconvincing or the argument was represented in an unreliable

format. Therefore, it was natural that the link was not as frequently referenced. This

phenomenon could also be well explained by the broad maturation of proof structure

model (Tall et al, see Figure 4). According to this model, lower cognitive stages involved

perceptual recognition, verbal description, pictorial or symbolic representation, and


248
definitions. Only at the higher levels, i.e. equivalence, crystalline concepts, and deductive

knowledge structure, would students become able to reflect on the link used in the

argumentation.

Additionally, factors that caused the subjects’ different rankings of the same type

of arguments in different contexts were discussed. Factors that were not context specific,

such as the complexity of the language used in the arguments, students’ familiarity with

the context used in the arguments, the clarity of the explanation presented were identified.

However some context specific factors were also detected. For example, some subjects

were more likely to see the common properties among shapes than between numbers,

which led to a different interpretation of the inductive arguments in the number theory

and geometry contexts. In particular, they viewed examples used in the number theory

problem as isolated cases which couldn’t show the general validity of the conjecture,

while they considered examples used in the geometry problems as generic examples that

demonstrated why the corresponding conjecture was always true. In addition, visual

arguments could serve for different purposes in different contexts. Diagrams and figures

adopted in a visual argument could be used to demonstrate quantity of objects or spatial

relationships. Depending on whether students were able to visualize the quantity or

spatial relationships, their views of the visual arguments could also be different. This was

an important finding since few past studies had specified an explanation for students’

different evaluations of seemly similar type of arguments used in different contexts.

It was also observed that for 5 subjects a convincing argument needed to be easy

to understand. This finding is compatible with Hanna & Jahnke’s (1993) suggestion that

whether an argument was understandable had a greater impact on students’ conviction


249
than its rigor. Results of the current study further provide an elaboration of this point by

offering that arguments that uses easy expression, simple examples, and familiar

concepts/procedures were more likely to be understood by students. In this study, the

participants used other personal standards used to determine whether an argument was

convincing. For example, one of the participant, Allen, believed that a convincing

argument’s reasoning procedure needed to be straightforward. Betty liked an argument to

contain more details. Blake advocated an opinion that countered Betty’s, claiming that a

convincing argument should not provide the complete and detailed procedure but leave

some space for readers to think. Rigor was not as important as other factors when

determining whether an argument was convincing to these students.

Third, the study relied on a novel theoretical framework, i.e. CCIA (see Figure 7),

to classify aspects of proofs and different genres within each aspect for documenting

students’ foci when they evaluated arguments. In order to clarify ambiguities associated

with sources contributing to individuals’ choices, it is important to acknowledge that

neither the representation, source of conviction, nor the link between source and the

conclusion can be identified merely by looking at the text and content of the argument.

Instead, they reside in one’s comprehension of the argument. To classify one’s

understanding of an argument instead of its expression was the most distinct feature of

CCIA as a proof classification model. An argument could appear to be inductive (e.g. D1

and E1); however, when a learner had perceived more general properties through the

examination of one or a few cases, then an seemly inductive argument was actually

treated as a transformational one even though description of the transformation was not

included in text (e.g. the cases of Allen, Amy and Betty). Therefore, to perceive students’
250
evaluation of different argument types, it was important to first understand their

comprehension of the arguments. CCIA took personal interpretation into consideration

and hence was a more accurate model for investigations on students’ thinking.

Lastly, few studies have investigated middle school students’ comprehension and

evaluation of given proofs. This has been, in part, due to the absence of instruments that

support such investigations. In this study, five problems were designed (four were

included in SMR and another was used in the interview) as to enrich the task reservoir

that were appropriate for students who have been introduced to symbolic expression and

proofs in school mathematics. These tasks were embedded in different branches of school

mathematics and provided a variety of problem contexts. The argument types provided in

each problems were aligned with those used in other contexts to assess whether students’

view of mathematical proof were consistently developed across the fields in school

mathematics. The tasks can also be used for older students to examine whether school

interventions have fostered their mathematical reasoning skills.

When comparing the tasks used in this study to existing materials, it was found

that problems used in Harel & Sowder’s (1998) study involved more advanced

mathematical topics since it was designed to be used for college students. In addition,

Harel & Sowder studied the arguments generated by students instead of their judgment of

proposed items. Yang & Lin (2008), Healy & Hoyles (2000), and Stylianides &

Stylianides (2008a) studied students’ evaluation of provided arguments. However, tasks

used in Yang & Lin’s study were restricted in the geometry contexts. Healy & Hoyles’s

questionnaire contained both geometry and number theory tasks, however only the later

were published in that work. Tasks used by Stylianides & Stylianides covered various
251
mathematical contexts, and some didn’t involve complex mathematical concepts and

hence can be used for younger students. However, the interested subjects for their study

were primarily college students. Therefore, the tasks used in this study enriched the task

reservoir in three major aspects: 1) Tasks were specially designed for students who were

first introduced to algebraic expression and geometry proofs; 2) Tasks were embedded in

multiple branches of school mathematics; 3) The type of arguments used in each problem

aligned with those used in other problems, which enabled a between-context comparison.

Limitation of the Study

The first limitation of the study was the unconfirmed effect of multiple

approaches on a learner’s conviction. Indeed the survey and interview data had

demonstrated that individuals’ judgment of arguments was highly diverse. Consequently,

the use of multiple strategies was suggested since any single approach might only

contribute to some students’ conviction. However, it was unclear, merely based on our

data, whether the use of multiple argument types had enhanced a particular individual’s

conviction about the validity of an argument than just adopting one approach that he/she

found most appealing 10. Therefore, studies to verify the actual effect of adopting

argumentation from different aspects are needed.

Second, the participants’ evaluation of one argument could have been altered by

their exposure to multiple arguments presented in each context. When providing an

evaluation, the participants read all four arguments from each problem at the same time.

10
Nonetheless, Allen did indicated that arguments that involved a combination of formula and visual illustration would
be “perfect.”
252
Since their perception of these arguments involved not only information extracted from

them but could also include construction of new mental images, their judgment

subconsciously may have been evoked by this new knowledge. Consequently, their

evaluation of an argument might not be based on information provided by this argument

alone. The possible disturbance between subjects’ evaluation of different arguments need

to be addressed in future studies.

Lastly, although this study had highlighted some aspects and features of

arguments that largely impacted students’ evaluation, the emergent patterns were not

precise enough to allow for a prediction of individuals’ choices when a new problem was

proposed. This wasn’t surprising since we had focused on the impact of the aspects of

arguments, while other personal and contextual factors, such as learners’ background

knowledge, existing classroom experience, and specific features of certain mathematical

topics, were not considered. Therefore, investigations that seek to identify personal and

contextual factors and their impact on one’s judgment of arguments are essential to

unpack students’ rationales more precisely. Realization of this viewpoint led to a

reflection of existing theories in describing children’s proof learning.

Reflection on Existing Theories

In studying the phenomena of children’s proof learning, it is a common practice to

identify phases where they are able to ide ntify, understand, appreciate or produce certain

types of proofs or certain components of arguments and to describe how they develop

through these phases (Tall et al, 2012; Waring, 2000). In order to do so, a classification of

the types of proofs or components of arguments children are capable to identify,


253
understand, appreciate, or produce was needed. The stages described in Yang & Lin’s

(2008) model and Tall et al’s (2012) framework concerned students’ understanding of

different components of arguments (e.g. the evidence, concepts, and links), whereas

Waring (2000), Harel & Sowder (1998), and Simon (1996) built their theories by

classifying different types of arguments (e.g. inductive, deductive, and transformational).

The theoretical framework of the current study, i.e. CCIA, considered students’

understanding of different components of mathematical arguments and used features of

these components to provide a more precise classification of the proof types so that each

argument could be classified based on its representation, source of evidence, and the link

between evidence and conclusion. These models, although different in many ways,

shared some common deficiencies. Most prominently, the types and components of

proofs identified in these theories are not content specific. For example, inductive

arguments, as categorized by Waring, Harel & Sowder, and CCIA, include verifications

by empirical tests in geometry, number theory, probability, and other mathematical areas.

Transformational reasoning, as described by Simon, includes visualizing movements in

geometric contexts as well as perceiving the fixed/changing properties when certain

values shift in number theory contexts. The ability to understand and apply definitions as

identified in Tall et al’s model describes a stage in learning geometric proofs as well as in

working on proofs in abstract algebra. Even Yang & Lin’s model, which was restricted to

the context of geometry, didn’t consider the differences between 2 dimensional and 3

dimensional geometry, or between triangles and circles. Since the designers of these

theories had aimed to develop an overarching understanding of the discipline of

mathematics, they were able to see the connection between two arguments in two
254
different mathematical fields or topics. Consequently, certain arguments were grouped

together as one category since they share some logical structure or possess some other

“macro” properties. However, proof learners, who haven’t yet developed the ability to

compare mathematical arguments across the content areas, might not be able to see the

connection between two arguments that were classified as the same type in

mathematicians’ view but were embedded in different contexts. Instead, their

understanding of an argument was rooted in their understanding of its specific

mathematical topic.

The argument types used in this study were generated based on the researcher’s

understanding of proofs. The results from both the survey and the interviews suggested

that students demonstrated inconsistent views toward the same type of argument in

different contexts. However, this phenomena might be explained differently. That is, the

inconsistency may have in fact existed in the different levels of understanding

represented by arguments that were classified as the same type (e.g. inductive argument

used in the number theory problem might not have offered as much explanation to

students as the inductive argument in the algebra problem did). As pointed out earlier, the

argument type was determined by the standard set by researchers who had a mature

understanding of mathematics and its logical structures. This level of understanding is

certainly different from that of school learners. Therefore, arguments classified in the

same category by researchers might not seem similar to students. As pointed out by

Lakatos (1976), even mathematician’s standard of reliable proofs changes when different

contexts were taken into consideration. For instance, visual illustration was widely used

in geometric proofs; however, they are no longer viewed as reliable when calculus was
255
taken into consideration. Therefore, it was natural for the learners to first develop their

justification skills in local contexts (Reid, 2011). Only when their reasoning skills

reached certain levels in two contexts were they able to identify and compare the

reasoning methods adopted.

With the emphasis on local development of mathematical reasoning ability, the

absence of content specific proof/argument classification model become more critical.

Drawing conclusion regarding the type of understanding students obtained from an

inductive argument might be premature without considering the specific problem context.

This was certainly evidence in the results of this study, where those interviewed clearly

demonstrated different perceptions of the inductive arguments among Problems A, B and

D. The status of visual arguments and their impact on students’ conviction wasn’t

conclusive either, since graphs and diagrams could serve for distinct functions in different

contexts (e.g. in B4, D4, and E2). Currently tools that measure students’ reasoning

maturity and characteristics in specific content areas are absent. Therefore, all features

and categories we constructed and used were based upon the synthesis of what was

known about mathematical reasoning as a generalized method. The theories were not

built upon the features of local content and learners’ understanding of such content. Note

that we didn’t attempted to deny the existence of more general patterns in students’

development of reasoning ability across the content areas. However, we would like to

address that merely identifying these general patterns might not be sufficient to

understand students’ development of disciplinary reasoning skills and as such limited in

the quality of guidance they provided to support curricular instructional designs.

256
Therefore, we call for the need to develop content specific proof/argument classification

and development models.

In addition, we claim the need to take personal factors into consideration in the

development of useful and explanatory theoretical models. The survey and interview

results had both identified great differences among the individuals. The differences did

not only appear in the participants’ judgment of arguments, but also resided in their focus

and rationale when making judgments. The individual differences were impacted by their

existing mathematical and non-mathematical experiences. Personal differences could

certainly influence learners’ mental images and how they perceive and interpret

arguments. The participants had found certain arguments convincing since they were

familiar with the scenarios provided in the arguments (e.g. “football field” in B2). They

had articulated topics and results learnt in class (e.g. “triangle are formula” in C2)

contributed to their conviction. Individuals’ path to reach conviction of a conjecture

might also vary. However, this element has rarely been taken into account from existing

proof understanding and reasoning development models. The focus has been on the

arguments that students were able to produce and the their judgment of certain type of

mathematical statements rather than the sources that may have contributed to their

thinking when making decisions. Therefore, in designing future (content specific) proof

classification and proof skill development models, personal factors may be considered as

key variables. These factors shouldn’t be treated as obstacles that impede the

development of proving skills, but as valuable sources for sense making and construction

of sound arguments.

257
Implication for Proof Teaching

As discussed in previous research, teaching students “the right way” of doing

proofs might help students pass examinations but it might also create a gap between the

work they used to show their teachers and how they would use to convince themselves

(Hanna & Jahnke, 1993; Healy & Hoyles, 2000). Therefore, pressing students to use a

rigorous reasoning format may not actually help them understand the logic embedded in a

mathematical problem. The process of nurturing mathematical reasoning should start with

an understanding of more “natural” ways in which the students argue. Those “natural”

ways are usually mathematically incomplete or at times incorrect, however they may help

learners understand and access the problem and can ultimately influence their judgment.

For example, our study revealed that students’ conviction was strongly impacted by

examination of examples. This idea coincided with the constructivist’s view of using

examples and counter examples to help students understand the construction of

mathematical structures in a heuristic way (Lakatos, 1976). Although using examples to

verify a statement is not a rigorous way to proof the statement is true, it does provide a

concrete context for students to work on and hence to understand the problem better.

The findings of this study highlighted the need to foster students’ proof capacity

in multiple branches of school mathematics. As suggested by existing literature and

supported by the findings of this study, learners’ understanding of proof develops locally

and doesn’t automatically transfer to other fields. Students may appreciate deductive

reasoning in one area, but still find visual illustration and use of examples convincing in

other contexts (e.g. Amy). Since proof ability essentially concerns the relationships

among concepts and properties, it is crucial for students to develop a conceptual


258
understanding of mathematical topics. When reasoning is addressed in different content

areas, there is greater potential for development of a coherent perception of mathematical

structure among learners.

Our findings pronounce the importance of nurturing students’ conviction via

multiple approaches, including the use of various evidence, representations and reasoning

modes. Since arguments that convinced students were highly diverse among individual

learners, any single approach might be appropriate and effective for only a small

proportion of students. Consequently, the use of multiple evidence, representations and

reasoning modes could be essential to help all students access a problem and perceive the

embedded connections or lack thereof. It is important to clarify that fluent use of

symbolic representation and known theorems didn’t guarantee that students understood

the algebraic expression’s general validity. This was evident in Betty’s case, where she

claimed the algebraic argument that adopted the Pythagoras Theorem was convincing and

she clearly explained the meaning of the variables used in the theorem; however she still

believed there existed counter examples to the conjecture. Therefore, cultivation into the

use of symbolic expression didn’t necessarily advance her reasoning ability.

Findings of this study do offer that a need exists for fostering students’

understanding of the standards used to distinguishing a convincing argument in

mathematics. Note that we are not suggesting that students at the introductory level need

to be taught to examine the rigor of each reasoning step in an argument. In fact, we posit

that students should be allowed to use any type of evidence, representation and reasoning

mode to investigate a problem and to convince themselves that a conjecture is always true.

However, since it was found that most of the interviewees didn’t realize that an argument
259
couldn’t be convincing if it didn’t show the conjecture was always true, we argue that

instruction should enable students to acquire an appreciation for such disciplinary

practice. Such an awareness is the foundation for future development of rigor in logic (e.g.

examination of counterexamples and the creation of cognitive conflict (Stylianides &

Stylianides, 2008b)).

Lastly, findings of this study emphasized the need to provide examples for

students to verify the validity of conjectures. Although empirical checking is not

considered a valid mathematical proving process, it does provide students access to the

problem, and provide them opportunities to make and testify conjecture, to seek patterns

and to explore approaches that verify a conjecture. The value of examples has been

addressed by other scholars (e.g. Balacheff, 1988; de Villiers, 2003; Simon, 1996;

Stylianides & Stylianides, 2008a). In this study, examples, as a type of evidence, were

also the most referenced components of arguments that impacted the subjects’ judgment.

Students’ preferred type of representations and reasoning modes might differ; however,

even those who were aware of the limitation of examples considered them helpful for

their understanding of the problem. Therefore, the use of examples (could be in various

representations) were highly recommended by findings of this study as introductory tools

for the instruction of proofs. This implication is compatible with the main stream

approach in this field of research.

260
REFERENCES

Armstrong, A. H. (Ed.) (1970). The Cambridge history of later Greek and early Medieval
philosophy. Cambridge, UK: Cambridge University Press.

Baker, A. (2009). Non-Deductive Methods in Mathematics. In Edward N. Zalta (ed.), The


Stanford Encyclopedia of Philosophy (Fall 2009 Edition). URL =
<http://plato.stanford.edu/archives/fall2009/entries/mathematics-nondeductive/>.

Balacheff, N. (1988). Aspects of proof in pupils’ practice of school mathematics. In D.


Pimm (Ed.), Mathematics, teachers and children (pp. 216-235). London, UK:
Holdder & Stoughton.

Balacheff, N. (1991). The benefit and limits of social interaction: The case of
mathematical proof. In A. Bishop, Mellin-Olsen, E. & van Dormolen, J. (Eds.),
Mathematical knowledge: Its growth through teaching (pp. 175-192). Dordrecht,
Netherlands: Kluwer.

Balaguer, M. (2008). Mathematical Platonism. In B. Gold and Simons, R. A. (Eds.),


Proof and Other Dilemmas: Mathematics and Philosophy (pp. 179-204).

Ball, D. L., & Bass, H. (2003). Making mathematics reasonable in school. In J. Kilpatrick,
Martin, W. G., & Schifter, D. (Eds.), A research companion to principles and
standards for school mathematics (pp. 27-44). Reston, VA: National Council of
Teachers of Mathematics.

Ball, D. L., & Bass, H. (2000). Interweaving content and pedagogy in teaching and
learning to teach: Knowing and using mathematics. In J. Boaler (Ed.), Multiple
perspectives on the teaching and learning of mathematics (pp. 83-104). Westport,
DT: Ablex.

Bell, A.W. (1976). A study of pupils’ proof-explanations in Mathematical situations.


Educational Studies in Mathematics, 7(1-2), 23-40.

Besicovitch, A. (1919). Sur deux questions d’integrabilite des fonctions. Journal of


Society of Physics and Mathematics, 2, 105–123.

Biggs, J., & Collis, K. (1982). Evaluating the quality of learning: the SOLO taxonomy.
New York: Academic Press.

261
Bloom, B. S. (1984). Taxonomy of educational objectives: Book I cognitive domain (2nd
edition). Boston, MA: Addison Wesley Publishing Company.

Boero, P. (Ed.) (2007). Theorems in school: From history, epistemology and cognition to
classroom practice. Rotterdam, Netherland: Sense Publisher.

Brabiner, J. V. (2009). Why Proof? Some Lessons from the History of Mathematics. In
Fou-Lai Lin, Hsieh, F., Hanna, G. & de Villiers, M. (Eds.), Proceedings of the
ICMI Study 19 conference: Proof and proving in mathematics education (Vol. 1,
pp. 12). Taipei, Taiwan: National Taiwan Normal University.

Brouwer, L. E. J. (1905/1996). Life, art and mysticism. Notre Dame Journal of Formal
Logic, 37(3), 389-429.

Brown, J. R. (2008). Philosophy of mathematics: A contemporary introduction to the


world of proofs and pictures (2nd ed.). Routledge: New York.

Bruner, K. (1987). The perception of man and the conception of society: Two approaches
to understand society. Economic Inquiry, 15(3), 367-388.

Bruner, J. S. (1966). Towards a Theory of Instruction. Cambridge, MA: Harvard


University Press.

Burger, W. F. & Shaughnessy, J. M. (1986). Characterizing the van Hiele levels of


development in geometry. Journal for Research in Mathematics Education, 17(1),
31-48.

Carnap, R. (1937). The logical syntax of language. London, UK: K. Paul Trench.

Chazan, D. (1993). High school geometry students’ justification for their views of
empirical evidence and mathematical proof. Educational Studies in Mathematics,
24(4), 359-387.

Chazan, D., & Lueke, H. M. (2009). Relationships between disciplinary knowledge and
school mathematics: Implications for understanding the place of reasoning and
proof in school mathematics. In D. A. Stylianou, Blanton, M. L., & Knuth E. J.
(Eds.), Teaching and Learning Proof Across the Grades: AK-16 Perspective (pp.
21-39). New York: Routledge.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New
Jersey: Lawrence Erlbaum.

Council of Chief State School Officers (2010). Common Core State Standards
(Mathematics). National Governors Association Center for Best Practices.
Washington, D. C.

262
Clements, D. H. & Battista, M. T. (1992). Geometry and spatial reasoning. In D. Grouws,
(Ed.), Handbook of Research on Mathematics Teaching and Learning (pp. 420-
464). New York: NCTM/Macmillan.

Creswell, J. W. & Plano Clark, V. L. (2011). Designing and conducting mixed methods
research. Thousand Oaks, CA: Sage Publications, Inc.

Davis, P. J. (1976). The nature of proof. In M. Carss (Ed.), Proceedings of the fifth
international congress on mathematical education. Boston, MA: Birkhauser.

de Villiers, M. (2012). An illustration of the explanatory and discovery function of proof.


In Proceedings of the 12th International Congress on Mathematics Education (pp.
1122-1137). Seoul, Korea.

de Villiers, M. (2003). Rethinking proof with the Geometer’s Sketchpad. Emeryville, CA:
Key Curriculum Press.

de Villiers, M. (1998). An alternative approach to proof in dynamic geometry. In R.


Lehrer & D. Chazan (Eds.), New directions in teaching and learning geometry (pp.
369-415). Lawrence Erlbaum.

de Villiers, M. (1990). The role and function of proof in mathematics. Pythagoras, 24,
17–24.

Dreyfus, T. (2006). Linking theories in mathematics education. In A. Simpson (Ed.),


Retirement as process and concept: A festschrift for Eddie Gray and David Tall
(pp. 77-82). Prague, Czec Republic: Karlova Univerzita v Praze.

Dreyfus, T. (1999). Why Johnny can’t prove. Educational Studies in Mathematics,


38(1/3), 85-109.

Ernest, P. (1996). New angles on old rules. Times Higher Educational Supplement. Times
Supplements Ltd.

Fawcett, H. P. (1995). The nature of proof. Thirteenth Yearbook of the NCTM. New York:
Teachers College, Columbia University. (Original work published 1938).

Fischbein, E. (1982). Intuition and proof. For the Learning of Mathematics 3(2), 9–24.

Freudenthal, H. (1971). Geometry between the devil and the deep sea. Educational
Studies in Mathematics, 3(3-4), 413-435.

Freudenthal, H. (1973). Mathematics as an educational task. Dordrecht: Reidel.

Grabiner, J. V. (2012). Why proof? A historian’s perspective. In G. Hanna & M. de


Villiers (Eds), Proof and Proving in Mathematics Education (The 19th ICMI
Study, New ICMI Study Series, Vol. 15), (pp. 147-167). Dordrecht: Springer.
263
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework
for mixed-method evaluation designs. Educational Evaluation and Policy
Analysis, 11(3), 255-274.

Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und
verwandter Systeme. Monatshefte für Mathematik und Physik, 38, 173–98.

González, G., & Herbst, P. (2006). Competing arguments for the geometry course: Why
were American high school students to study geometry in the twentieth century?
International Journal for the History of Mathematics Education, 1(1), 7-33.

Hanna, G. (1983). Rigorous proof in mathematics education. Toronto, CA: OISE Press.

Hanna, G. (2000a). A critical examination of three factors in the decline of proof.


Interchange, 31(1), 21-33.

Hanna, G. (2000b). Proof, explanation and exploration: An overview. Educational


Studies in Mathematics, 44, 5-23.

Hanna, G., & Jahnke, H. N. (Eds.). (1993). Aspects of proof [Special issue]. Educational
Studies in Mathematics, 24(4).

Harel, G., & Sowder, L. (1998). Students’ proof schemes: Results from exploratory
studies. In A. H. Schoenfeld, Kaput, J., & Dubinsky, E. (Eds.), Research in
Collegiate Mathematics Education III (pp. 234-283). American Mathematical
Society.

Harel, G., & Sowder, L. (2007). Toward comprehensive perspectives on the learning and
teaching of proof. In F. Lester (Ed.), Second handbook of research in mathematics
teaching and learning (pp. 805-842). Charlotte, NC: Information Age Publishing

Healy, L., & Hoyles, C. (2000). A study of proof conceptions in algebra. Journal for
Research in Mathematics Education, 31(4), 396–428.

Heinze, A., & Reiss, K. (2009). Developing argumentation and proof competencies in the
mathematics classroom. In D. A. Stylianou, Blanton, M. L., & Knuth E. J. (Eds.),
Teaching and Learning Proof Across the Grades: AK-16 Perspective (pp. 191-
203). New York: Routledge.

Herbst, P., & Brach, C. (2006). Proving and doing proofs in high school geometry classes:
What is it that is going on for students? Cognition and Instruction, 24(1), 73–122.

Hersh, R. (2009). What I would like my students to already know about proof. In D. A.
Stylianou, Blanton, M. L., & Knuth E. J. (Eds.), Teaching and Learning Proof
Across the Grades: AK-16 Perspective (pp. 17-20). New York: Routledge.

264
Hilbert, D., & Bernays, P. (1934/1939). Grundlagen der Mathematik I and II, first
editions. Berlin, Germany: Verlag Julius Springer.

Hoyles, C. (1997). The curricular shaping of students’ approaches to proof. For the
Learning of Mathematics, 17(1),7-16.

IAS/PCMI (2007). International Seminar: Bridging policy and practice in the context of
reasoning and proof. Institute for Advanced Study / Park City Mathematics
Institute. Princeton, NJ.
< http://mathforum.org/pcmi/int2007.html>

Inglis, M., & Alcock, L. (2012). Expert and novice approaches to reading mathematical
proofs. Journal of Research in Mathematics Education, 43(4), 358-390.

Jaffe, A. & Quinn, F. (1993). Theoretical mathematics: Toward a cultural synthesis of


mathematics and theoretical physics. Bulletin of the American Mathematics
Society, 29, 1-13.

Johnson, P. E. (1972). A History of Set Theory. Prindle, Weber & Schmidt

Johnson, R. B. & Onwuegbuzie, A. J. (2004). Mixed-methods research: a research


paradigm whose time has come. Educational Researcher, 33(7), 14-26.

Lakatos, I. (1976). Proofs and refutations: The logic of mathematical discovery.


Cambridge, UK: Cambridge University Press.

Lampert, M. (1992). Practices and problems in teaching authentic mathematics. In F. K.


Oser, A. Dick, & J. Patry (Eds.), Effective and responsible teaching: The new
synthesis (pp. 295–314). San Francisco, CA: Jossey-Bass Publishers.

Liu, Y. & Manouchehri, A. (2012). Nurturing high school students’ understanding of


proof as a convincing way of reasoning: Results from an exploratory study. In
Proceedings of the 12th International Congress on Mathematics Education (pp.
2848-2857). Seoul, Korea.

Longo, G. (2009). Theorems as constructive visions. In Fou-Lai Lin, Hsieh, F., Hanna, G.
& de Villiers, M. (Eds.), Proceedings of the ICMI Study 19 conference: Proof and
proving in mathematics education (Vol. 1, pp. 13-25). Taipei, Taiwan: National
Taiwan Normal University.

Kakeya, S (1917). Some problems on maximum and minimum regarding ovals. Tohoku
Science Reports, 6, 71–88.

Kieren, T., & Pirie, S. (1991). Recursion and the mathematical experience. In L. Steffe
(Ed.), The Epistemology of Mathematical Experience (pp. 78-101). New York:
Springer Verlag Psychology Series.

265
Knuth, E. J., Choppin, J. M., & Bieda, K. N. (2009). Middle school students’ production
of mathematical justifications. In D. A. Stylianou, Blanton, M. L., & Knuth, E. J.
(Eds.), Teaching and Learning Proof Across the Grades: AK-16 Perspective (pp.
153-170). New York: Routledge.

Krantz, S. G. (2007). The history and concept of mathematical proof.


http://www.math.wustl.edu/~sk/eolss.pdf

Kuchemann, D., & Hoyles, C. (2009). From empirical to structural reasoning in


mathematics: Tracking changes over time. In D. A. Stylianou, Blanton, M. L., &
Knuth E. J. (Eds.), Teaching and Learning Proof Across the Grades: AK-16
Perspective (pp. 171-190). New York: Routledge.

Martin, L. C. (2008). Folding back and the dynamical growth of mathematical


understanding: Elaborating the Pirie-Kieren Theory. Journal of Mathematical
Behavior, 27(1), 64-85.

Mason, J. (2009). Mathematics education: Theory, practice and memories over 50 years.
In S. Lerman and B. Davis (Eds.), Mathematical action & structures of noticing:
Studies on John Mason’s contribution to mathematics education (pp. 1-14).
Rotterdam: Sense Publisher.

Marrades, R., & Gutiérrez, A. (2000). Proofs produced by secondary school students
learning geometry in a dynamic computer environment. Educational Studies in
Mathematics, 44 (1&2), 87-125.

Mejia-Ramos, J. P., & Inglis, M. (2009). Argumentative and proving activities in


mathematics education research. In F.-L. Lin, F.-J. Hsieh, G. Hanna, and M. de
Villiers (Eds.), Proceedings of the ICMI Study 19 conference: Proof and Proving
in Mathematics Education (Vol. 2, pp. 88-93), Taipei, Taiwan.

Mejia-Ramos, J. P., Fuller, E., Weber, E.; Rhoads, K. & Samkoff, A. (2012). An
assessment model for proof comprehension in undergraduate mathematics.
Educational Studies in Mathematics, 79(1), 3-18.

McConaughy, S. H., & Achenbach, T. M. (2001). Manual for the Semistructured Clinical
Interview for Children and Adolescents (2nd ed.). Burlington, VT: University of
Vermont, Research Center for Children, Youth, and Families.

National Council of Teachers of Mathematics (2000). Principles and standards for school
mathematics. Reston, VA: NCTM.

Onwuegbuzie, A. J., & Leech, N. J. (2006). Linking research question to mixed methods
data analysis procedures. The Qualitative Report, 11(3), 2006, 474-498.

266
Pal, J. (1920). Ueber ein elementares variationsproblem. Kongelige Danske
Videnskabernes Selskab Math.-Fys, Medd. 2, 1–35.

Pegg, J., & Davey, G. (1998). Interpreting student understanding in geometry: A synthesis
of two models. In R. Lehrer & D. Chazan (Eds.), Designing learning
environments for developing understanding of geometry and space (pp. 109–133).
Mahwah, NJ: Lawrence Erlbaum Associates.

Piaget, J. (1987). The role of necessity in cognitive development. Minneapolis, MN:


University of Minnesota Press.

Piaget, J. (1985). The Equilibration of Cognitive Structures: The Central Problem of


Intellectual Development (T. Brown & K. J. Thampy, Trans.). Chicago, IL:
University of Chicago Press.

Piaget, J. (1928). Judgment and reasoning in the child. New York, NY: Harcourt, Brace,
and Co.

Pirie, S. (1988). Understanding – instrumental, relational, formal, intuitive … How can


we know? For the Learning of Mathematics, 8(3), 2-6.

Pirie, S., & Kieren, T. (1992). Creating constructivist environments and constructing
creative mathematics. Educational Studies in Mathematics, 23(5), 505-528.

Polya, G. (1954). Mathematics and plausible reasoning: Induction and analogy in


mathematics, Vol 1. Princeton, NJ: Princeton University Press.

Pruss, A. R. (2006). The Principle of Sufficient Reason: A Reassessment. Cambridge, UK:


Cambridge University Press.

Recio, A. M., & Godino, J. D. (2001). Institutional and personal meanings of


mathematical proof. Educational Studies in Mathematics, 48(1), 83-99.

Reid, D. A. (2011). Understanding proof and transforming teaching. In L. R. Wiest, &


Lamberg, T. (Eds.), Proceedings of the 33rd Annual Meeting of the North
American Chapter of the International Group for the Psychology of Mathematics
Education (pp. 15-30). Reno, NV: University of Nevada, Reno.

Reid, D. A. (2002). Conjectures and refutations in grade 5 mathematics. Journal for


Research in Mathematics Education, 33(1), 5–29.

Russell, B. (1903). Principles of mathematics. Cambridge, UK: Cambridge University


Press.

Schoenfeld, A. H. (Ed.) (1994). Mathematical thinking and problem solving. Hillsdale,


NJ: Erlbaum.

267
Schoenfeld, A. H. (1991). On mathematics as sense-making: An informal attack on the
unfortunate divorce of formal and informal mathematics. In J. Voss, D. N. Perkins,
& J. Segal (Eds.), Informal reasoning and education (pp. 311-343). Hillsdale, NJ:
Erlbaum.

Schoenfeld, A. H. (1988). When good teaching leads to bad results: The disasters of
“well-taught” mathematics courses. Educational Psychologist, 23(2), 145–166.

Sekiguchi, Y. (1991). An investigation on proofs and refutations in the mathematics


classroom. Unpublished doctoral dissertation, University of Georgia, Athens.

Selden, A., & Selden, J. (2003). Validations of proofs considered as texts: Can
undergraduates tell whether an argument proves a theorem? Journal for Research
in Mathematics Education, 34(1), 4–36.

Senk, S. L. (1985). How well do students write geometry proofs? Mathematics Teacher,
78(6), 448-456.

Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and


directions. In D.A. Grouws (Ed.), Handbook of Research on Mathematics
Teaching and Learning (pp. 465-494). Reston, VA: National Council of Teachers
of Mathematics.

Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching.


Educational Researcher, 15(2), 4-14.

Simon, M. A. (1996). Beyond inductive and deductive reasoning: the search for a sense
of knowing. Educational Studies in Mathematics, 90(2), 197-210.

Stylianides, A. J. (2007). Proof and proving in school mathematics. Journal for Research
in Mathematics Education, 38(3), 289-321.

Stylianides, G. J., & Stylianides, A. J. (2008a). Proof in school mathematics: Insights


from psychological research into students’ ability for deductive reasoning.
Mathematical thinking and learning, 10(2), 103-133.

Stylianides, G. J., & Stylianides, A. J. (2008b). Enhancing undergraduate students’


understanding of proof. In Electronic Proceedings of the 11th Conference on
Research in Undergraduate Mathematics Education
(http://sigmaa.maa.org/rume/crume2008/Proceedings/Stylianides&Stylianides_L
ONG(21).pdf), San Diego, CA.

Tarricone, P. (2011). The taxonomy of metacognition. New York: Psychology Press.

Tall, D. (2009). The development of mathematical thinking: problem-solving and proof.


In Celebration of the academic life and inspiration of John Mason.

268
Tall, D. (2005). The transition from embodied thought experiment and symbolic
manipulation to formal proof. In M. Bulmer, H. MacGillivray & C. Varsavsky
(Eds.), Proceedings of Kingfisher Delta’05, Fifth Southern Hemisphere
Symposium on Undergraduate Mathematics and Statistics Teaching and Learning
(pp. 23-35). Fraser Island, Australia.

Tall, D. (2002). Differing modes of proof and belief in mathematics. In F.-L. Lin (Ed.),
International Conference on Mathematics: Understanding Proving and Proving to
Understand (pp. 91–107). National Taiwan Normal University, Taipei, Taiwan.

Tall, D. (1999). The cognitive development of proof: Is mathematical proof for all or for
some? In Z. Usiskin (Ed.), Developments in School Mathematics Education
Around the World (Vol. 4, pp. 117–136). Reston, Virginia: NCTM.

Tall, D. (1991). To prove or not to prove. Mathematics Review, 1(3), 29-32.

Tall, D. et al. (2012). Cognitive development of proof. In G. Hanna, & de Villiers, M.


(Eds.), Proof and proving in mathematics education (pp. 13-50). New York:
Springer.

Teddlie, C. & Tashakkori, A. (2009). Foundations of mixed methods research: Integrating


quantitative and qualitative approaches in the social and behavioral sciences. Los
Angeles, CA: Sage Publications, Inc.

Thurston, W. P. (1995). On proof and progress in mathematics. For the Learning of


Mathematics, 15(1), 29-37.

Troelstra, A. S. ( 1977). Choice sequences. Oxford Logic Guides.

van Hiele, P.M. (1986). Structure and insight: A theory of mathematics education. New
York: Academic Press.

von Glasersfeld, E. (1994).A radical constructivist view of basic mathematical concepts.


In Ernest, Paul (Ed.), Constructing mathematical knowledge: Epistemology and
mathematics education (Studies in mathematics education Vol. 4, pp.5-7).
Abingdon, Oxon: Routledge.

Vygotsky, L. S. (1978). Mind by society: The development of higher psychological


process. Cambridge, MA: Harvard University Press.

Usiskin, Z. (1980). What should not be in the algebra and geometry curricula of average
college-bound students? Mathematics Teacher, 73(6), 413-424.

Usiskin, Z. (1987). Resolving the continuing dilemmas in school geometry. In M. M.


Lindquist and A. P. Shulte (Eds .), Learning and Teaching Geometry, K-12, 1987
Yearbook (pp. 17-31). Reston, VA: National Council of Teachers of Mathematics.

269
Waring, S. (2000). Can you prove it? Developing concepts of proof in primary and
secondary schools. Leicester, UK: The Mathematical Association.

Weir, A. (2011). Formalism in the philosophy of mathematics. In E. N. Zalta (Ed.), The


Stanford Encyclopedia of Philosophy. URL =
<http://plato.stanford.edu/archives/fall2011/entries/formalism-mathematics/>

Weber, K. (2004). Traditional instruction in advanced mathematics courses: A case study


of one professor’s lectures and proofs in an introductory real analysis course.
Journal of Mathematical Behavior, 23(2), 115–133.

Weber, K. (2001). Student difficulty in constructing proofs: The need for strategic
knowledge. Educational Studies in Mathematics, 48(1), 101-119.

Yin, R. K. (2009). Case study research: Design and methods (Fourth Edition). Thousand
Oaks, CA: SAGE Publications.

Yang, K., & Lin, F. (2008). A model of reading comprehension of geometry proof.
Educational Studies in Mathematics, 67(1), 59-76.

Zack, V. (1997). “You have to prove us wrong”: Proof at the elementary school level. In
E. Pehkonen (Ed.), Proceedings of the 21st Conference of the International Group
for the Psychology of Mathematics Education (Vol. 4, pp. 291-298). Lahti:
University of Helsinki.

270
APPENDIX A. SURVEY RESULTS: PAIRWISE COMPARISONS OF

ARGUMENTS IN EACH PROBLEM

271
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
A1 A2 .174* .037 .000 .102 .246
A3 .111* .036 .002 .041 .182
A4 .057 .036 .115 -.014 .127
*
A2 A1 -.174 .037 .000 -.246 -.102
A3 -.063 .041 .121 -.143 .017
*
A4 -.118 .042 .005 -.200 -.036
*
A3 A1 -.111 .036 .002 -.182 -.041
A2 .063 .041 .121 -.017 .143
A4 -.055 .041 .187 -.136 .027
A4 A1 -.057 .036 .115 -.127 .014
A2 .118* .042 .005 .036 .200
A3 .055 .041 .187 -.027 .136
B1 B2 -.118* .042 .005 -.200 -.035
*
B3 .137 .047 .004 .044 .229
B4 .183* .048 .000 .089 .276
*
B2 B1 .118 .042 .005 .035 .200
*
B3 .254 .045 .000 .166 .343
*
B4 .300 .046 .000 .210 .391
*
B3 B1 -.137 .047 .004 -.229 -.044
*
B2 -.254 .045 .000 -.343 -.166
B4 .046 .046 .311 -.043 .136
*
B4 B1 -.183 .048 .000 -.276 -.089
*
B2 -.300 .046 .000 -.391 -.210
B3 -.046 .046 .311 -.136 .043
Continued

Table 39. Pairwise comparisons: Participants’ ratings on whether the arguments in each
problem were understandable

272
Table 39 continued
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
C1 C2 .011 .044 .811 -.076 .097
C3 -.080 .044 .073 -.167 .007
C4 -.086 .044 .053 -.173 .001
C2 C1 -.011 .044 .811 -.097 .076
*
C3 -.090 .046 .048 -.180 -.001
*
C4 -.097 .044 .028 -.183 -.010
C3 C1 .080 .044 .073 -.007 .167
*
C2 .090 .046 .048 .001 .180
C4 -.006 .044 .887 -.093 .081
C4 C1 .086 .044 .053 -.001 .173
*
C2 .097 .044 .028 .010 .183
C3 .006 .044 .887 -.081 .093
*
D1 D2 .237 .041 .000 .156 .319
D3 .021 .043 .624 -.063 .105
*
D4 .208 .044 .000 .121 .295
*
D2 D1 -.237 .041 .000 -.319 -.156
*
D3 -.216 .045 .000 -.305 -.128
D4 -.029 .044 .506 -.116 .057
D3 D1 -.021 .043 .624 -.105 .063
*
D2 .216 .045 .000 .128 .305
*
D4 .187 .044 .000 .100 .273
*
D4 D1 -.208 .044 .000 -.295 -.121
D2 .029 .044 .506 -.057 .116
*
D3 -.187 .044 .000 -.273 -.100
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

273
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
A1 A2 -.318* .073 .000 -.462 -.173
A3 -.270* .075 .000 -.418 -.123
*
A4 -.474 .075 .000 -.622 -.326
*
A2 A1 .318 .073 .000 .173 .462
A3 .047 .068 .487 -.087 .182
*
A4 -.156 .063 .014 -.281 -.032
*
A3 A1 .270 .075 .000 .123 .418
A2 -.047 .068 .487 -.182 .087
*
A4 -.204 .064 .002 -.330 -.078
*
A4 A1 .474 .075 .000 .326 .622
A2 .156* .063 .014 .032 .281
*
A3 .204 .064 .002 .078 .330
B1 B2 -.154 .094 .103 -.339 .032
*
B3 -.685 .090 .000 -.862 -.508
B4 -.510* .086 .000 -.681 -.340
B2 B1 .154 .094 .103 -.032 .339
*
B3 -.531 .089 .000 -.708 -.355
*
B4 -.357 .094 .000 -.543 -.170
*
B3 B1 .685 .090 .000 .508 .862
*
B2 .531 .089 .000 .355 .708
*
B4 .175 .074 .020 .028 .322
*
B4 B1 .510 .086 .000 .340 .681
*
B2 .357 .094 .000 .170 .543
B3 -.175* .074 .020 -.322 -.028
Continued

Table 40. Pairwise comparisons of survey results: Participants’ ratings on whether the
arguments in each problem were convincing

274
Table 40 continued
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
C1 C2 -.172* .071 .017 -.313 -.031
C3 .019 .082 .815 -.142 .180
C4 -.096 .072 .184 -.237 .046
*
C2 C1 .172 .071 .017 .031 .313
*
C3 .191 .077 .014 .039 .343
C4 .076 .067 .258 -.057 .209
C3 C1 -.019 .082 .815 -.180 .142
*
C2 -.191 .077 .014 -.343 -.039
C4 -.115 .077 .137 -.266 .037
C4 C1 .096 .072 .184 -.046 .237
C2 -.076 .067 .258 -.209 .057
C3 .115 .077 .137 -.037 .266
D1 D2 .074 .084 .376 -.091 .240
D3 .034 .079 .671 -.123 .191
D4 -.047 .087 .588 -.219 .125
D2 D1 -.074 .084 .376 -.240 .091
D3 -.041 .084 .630 -.207 .126
D4 -.122 .081 .137 -.282 .039
D3 D1 -.034 .079 .671 -.191 .123
D2 .041 .084 .630 -.126 .207
D4 -.081 .085 .341 -.249 .087
D4 D1 .047 .087 .588 -.125 .219
D2 .122 .081 .137 -.039 .282
D3 .081 .085 .341 -.087 .249
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

275
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
A1 A2 -.076 .069 .273 -.212 .060
A3 -.076 .066 .252 -.206 .054
*
A4 -.242 .059 .000 -.359 -.124
A2 A1 .076 .069 .273 -.060 .212
A3 .000 .068 1.000 -.133 .133
*
A4 -.166 .057 .004 -.279 -.053
A3 A1 .076 .066 .252 -.054 .206
A2 .000 .068 1.000 -.133 .133
*
A4 -.166 .057 .004 -.279 -.053
*
A4 A1 .242 .059 .000 .124 .359
A2 .166* .057 .004 .053 .279
*
A3 .166 .057 .004 .053 .279
B1 B2 -.056 .096 .559 -.245 .133
*
B3 -.329 .091 .000 -.508 -.149
B4 -.168 .091 .069 -.349 .013
B2 B1 .056 .096 .559 -.133 .245
*
B3 -.273 .089 .003 -.448 -.097
B4 -.112 .087 .198 -.283 .059
*
B3 B1 .329 .091 .000 .149 .508
*
B2 .273 .089 .003 .097 .448
*
B4 .161 .075 .035 .012 .310
B4 B1 .168 .091 .069 -.013 .349
B2 .112 .087 .198 -.059 .283
B3 -.161* .075 .035 -.310 -.012

Table 41. Pairwise comparisons of survey results: Participants’ ratings on whether the
arguments in each problem were explanatory

276
Table 41 continued
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
C1 C2 -.076 .078 .329 -.231 .078
C3 .045 .070 .523 -.093 .182
C4 .025 .069 .712 -.110 .161
C2 C1 .076 .078 .329 -.078 .231
C3 .121 .078 .122 -.033 .275
C4 .102 .074 .171 -.044 .248
C3 C1 -.045 .070 .523 -.182 .093
C2 -.121 .078 .122 -.275 .033
C4 -.019 .065 .769 -.147 .109
C4 C1 -.025 .069 .712 -.161 .110
C2 -.102 .074 .171 -.248 .044
C3 .019 .065 .769 -.109 .147
*
D1 D2 .176 .070 .014 .037 .315
D3 .020 .071 .777 -.121 .161
D4 -.014 .068 .844 -.149 .122
*
D2 D1 -.176 .070 .014 -.315 -.037
*
D3 -.155 .078 .047 -.309 -.002
*
D4 -.189 .075 .013 -.338 -.041
D3 D1 -.020 .071 .777 -.161 .121
*
D2 .155 .078 .047 .002 .309
D4 -.034 .065 .606 -.163 .095
D4 D1 .014 .068 .844 -.122 .149
D2 .189* .075 .013 .041 .338
D3 .034 .065 .606 -.095 .163
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

277
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
A1 A2 .002 .030 .944 -.056 .061
A3 .046 .028 .101 -.009 .102
*
A4 -.176 .035 .000 -.245 -.108
A2 A1 -.002 .030 .944 -.061 .056
A3 .044 .028 .117 -.011 .099
*
A4 -.179 .035 .000 -.246 -.111
A3 A1 -.046 .028 .101 -.102 .009
A2 -.044 .028 .117 -.099 .011
*
A4 -.223 .033 .000 -.287 -.159
*
A4 A1 .176 .035 .000 .108 .245
*
A2 .179 .035 .000 .111 .246
*
A3 .223 .033 .000 .159 .287
*
B1 B2 -.086 .032 .007 -.148 -.024
*
B3 -.074 .031 .019 -.135 -.012
B4 -.015 .030 .618 -.073 .043
*
B2 B1 .086 .032 .007 .024 .148
B3 .013 .034 .713 -.055 .080
*
B4 .071 .032 .027 .008 .135
*
B3 B1 .074 .031 .019 .012 .135
B2 -.013 .034 .713 -.080 .055
B4 .059 .032 .066 -.004 .122
B4 B1 .015 .030 .618 -.043 .073
*
B2 -.071 .032 .027 -.135 -.008
B3 -.059 .032 .066 -.122 .004
Continued

Table 42. Pairwise comparisons of survey results: Participants’ ratings on whether the
arguments in each problem were appealing

278
Table 42 continued
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
C1 C2 .013 .033 .704 -.052 .078
C3 .021 .033 .523 -.044 .086
*
C4 .069 .031 .026 .008 .130
C2 C1 -.013 .033 .704 -.078 .052
C3 .008 .032 .796 -.055 .072
C4 .057 .031 .066 -.004 .117
C3 C1 -.021 .033 .523 -.086 .044
C2 -.008 .032 .796 -.072 .055
C4 .048 .030 .113 -.012 .108
*
C4 C1 -.069 .031 .026 -.130 -.008
C2 -.057 .031 .066 -.117 .004
C3 -.048 .030 .113 -.108 .012
*
D1 D2 .124 .032 .000 .062 .186
D3 .038 .035 .277 -.031 .106
*
D4 .082 .033 .014 .017 .147
*
D2 D1 -.124 .032 .000 -.186 -.062
*
D3 -.086 .031 .005 -.146 -.026
D4 -.042 .029 .151 -.099 .015
D3 D1 -.038 .035 .277 -.106 .031
*
D2 .086 .031 .005 .026 .146
D4 .044 .032 .171 -.019 .107
*
D4 D1 -.082 .033 .014 -.147 -.017
D2 .042 .029 .151 -.015 .099
D3 -.044 .032 .171 -.107 .019
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no
adjustments).

279
APPENDIX B. SURVEY RESULTS: COMPARISON BETWEEN SUBGROUPS

OF STUDENTS

280
Dependent (I) School (J) School Mean Std. Error Sig.b 95% Confidence
Variable Difference Interval for Differenceb
(I-J)
Lower Upper
Bound Bound
A1.1 Group H Group L 0.01 0.056 0.853 -0.099 0.119
A1.2 Group H Group L 0.029 0.091 0.755 -0.151 0.208
A1.3 Group H Group L -0.068 0.078 0.383 -0.22 0.085
A2.1 Group H Group L -0.125 0.075 0.097 -0.273 0.023
A2.2 Group H Group L -0.096 0.082 0.244 -0.258 0.066
A2.3 Group H Group L -0.075 0.07 0.281 -0.213 0.062
A3.1 Group H Group L -0.016 0.072 0.825 -0.158 0.126
A3.2 Group H Group L 0.06 0.08 0.457 -0.098 0.218
A3.3 Group H Group L 0.038 0.07 0.59 -0.1 0.176
A4.1 Group H Group L -0.048 0.067 0.469 -0.179 0.083
A4.2 Group H Group L 0.079 0.077 0.308 -0.073 0.23
A4.3 Group H Group L 0.045 0.065 0.488 -0.082 0.172
A5.1 Group H Group L -0.004 0.044 0.93 -0.091 0.083
A5.2 Group H Group L -0.055 0.045 0.22 -0.144 0.033
A5.3 Group H Group L 0.01 0.04 0.8 -0.069 0.089
A5.4 Group H Group L 0.037 0.053 0.481 -0.067 0.141
B1.1 Group H Group L 0.119 0.081 0.143 -0.04 0.278
B1.2 Group H Group L -0.064 0.086 0.455 -0.234 0.105
B1.3 Group H Group L 0.029 0.082 0.722 -0.132 0.19
B2.1 Group H Group L 0.107 0.07 0.129 -0.031 0.245
B2.2 Group H Group L 0.053 0.086 0.54 -0.117 0.222
B2.3 Group H Group L 0.067 0.08 0.401 -0.09 0.224
B3.1 Group H Group L -0.052 0.086 0.548 -0.22 0.117
B3.2 Group H Group L -0.001 0.07 0.987 -0.139 0.137
B3.3 Group H Group L -0.042 0.066 0.526 -0.172 0.088
B4.1 Group H Group L 0.007 0.088 0.934 -0.165 0.18
B4.2 Group H Group L 0.047 0.071 0.504 -0.092 0.186
B4.3 Group H Group L 0.094 0.07 0.179 -0.043 0.232
B5.1 Group H Group L -0.058 0.043 0.176 -0.143 0.026
B5.2 Group H Group L 0.057 0.049 0.25 -0.04 0.154
B5.3 Group H Group L -0.069 0.048 0.148 -0.163 0.025
B5.4 Group H Group L 0.077 0.045 0.084 -0.01 0.165
Continued

Table 43. Survey results: Between school comparison

281
Table 43 continued
Dependent (I) School (J) School Mean Std. Error Sig.b 95% Confidence
b
Variable Difference Interval for Difference
(I-J)
Lower Upper
Bound Bound
C1.1 Group H Group L 0.012 0.081 0.88 -0.147 0.171
C1.2 Group H Group L 0.054 0.079 0.495 -0.101 0.208
C1.3 Group H Group L 0.079 0.07 0.262 -0.059 0.217
C2.1 Group H Group L 0.037 0.082 0.654 -0.125 0.198
C2.2 Group H Group L 0.012 0.073 0.873 -0.132 0.155
C2.3 Group H Group L 0.124 0.069 0.074 -0.012 0.26
C3.1 Group H Group L 0.125 0.079 0.114 -0.03 0.281
C3.2 Group H Group L 0.062 0.081 0.445 -0.098 0.222
C3.3 Group H Group L 0.076 0.073 0.3 -0.068 0.219
C4.1 Group H Group L 0.036 0.077 0.646 -0.117 0.188
C4.2 Group H Group L 0.047 0.076 0.538 -0.103 0.197
C4.3 Group H Group L 0.049 0.071 0.494 -0.091 0.189
C5.1 Group H Group L -0.044 0.048 0.366 -0.139 0.051
C5.2 Group H Group L -0.009 0.047 0.843 -0.102 0.084
C5.3 Group H Group L .103* 0.047 0.028 0.011 0.195
C5.4 Group H Group L -0.022 0.042 0.604 -0.105 0.061
D1.1 Group H Group L -0.053 0.075 0.479 -0.201 0.095
D1.2 Group H Group L -.172* 0.08 0.031 -0.329 -0.016
D1.3 Group H Group L 0.032 0.073 0.664 -0.111 0.175
D2.1 Group H Group L 0.076 0.088 0.39 -0.097 0.249
D2.2 Group H Group L 0.037 0.072 0.612 -0.105 0.179
D2.3 Group H Group L -0.001 0.069 0.99 -0.137 0.135
D3.1 Group H Group L 0.13 0.077 0.094 -0.022 0.281
D3.2 Group H Group L -0.004 0.078 0.956 -0.158 0.149
D3.3 Group H Group L 0.044 0.074 0.557 -0.102 0.189
D4.1 Group H Group L 0.013 0.085 0.882 -0.154 0.179
D4.2 Group H Group L .150* 0.075 0.045 0.003 0.297
D4.3 Group H Group L -0.042 0.068 0.542 -0.176 0.093
D5.1 Group H Group L -0.015 0.05 0.766 -0.113 0.083
D5.2 Group H Group L -0.036 0.042 0.39 -0.118 0.046
D5.3 Group H Group L -0.052 0.048 0.279 -0.147 0.042
D5.4 Group H Group L .093* 0.046 0.041 0.004 0.183
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

282
Dependent (I) (J) Mean Std. Error Sig.b 95% Confidence
Variable Gender Gender Difference Interval for Differenceb
(I-J)
Lower Upper
Bound Bound
A1.1 Female Male .064 .048 .190 -.032 .159
A1.2 Female Male -.082 .077 .290 -.233 .070
A1.3 Female Male .061 .067 .357 -.069 .192
*
A2.1 Female Male -.192 .064 .003 -.318 -.067
A2.2 Female Male -.121 .069 .080 -.257 .014
A2.3 Female Male -.106 .060 .076 -.224 .011
A3.1 Female Male -.087 .061 .152 -.206 .032
A3.2 Female Male -.054 .069 .435 -.189 .082
A3.3 Female Male -.009 .060 .882 -.127 .110
A4.1 Female Male -.076 .059 .197 -.192 .040
A4.2 Female Male -.009 .065 .895 -.137 .120
A4.3 Female Male .000 .056 .998 -.109 .109
A5.1 Female Male .040 .038 .293 -.035 .114
A5.2 Female Male -.076* .037 .043 -.149 -.002
A5.3 Female Male -.027 .034 .431 -.095 .041
A5.4 Female Male .085 .045 .060 -.004 .174
B1.1 Female Male -.054 .068 .425 -.188 .079
B1.2 Female Male -.076 .073 .299 -.220 .068
B1.3 Female Male .007 .069 .921 -.129 .143
B2.1 Female Male -.071 .060 .233 -.188 .046
B2.2 Female Male -.041 .073 .576 -.185 .103
B2.3 Female Male .046 .068 .500 -.088 .180
B3.1 Female Male -.088 .073 .226 -.231 .055
B3.2 Female Male -.055 .060 .356 -.172 .062
B3.3 Female Male -.032 .057 .573 -.144 .080
B4.1 Female Male -.057 .074 .443 -.202 .089
B4.2 Female Male -.004 .060 .943 -.123 .115
B4.3 Female Male -.045 .060 .453 -.164 .073
B5.1 Female Male .015 .037 .694 -.058 .088
B5.2 Female Male -.040 .042 .342 -.122 .042
B5.3 Female Male .064 .041 .124 -.017 .145
B5.4 Female Male -.033 .038 .385 -.108 .042
continued

Table 44. Survey results: Between gender comparison

283
Table 44 continued
Dependent (I) (J) Mean Std. Error Sig.b 95% Confidence
b
Variable Gender Gender Difference Interval for Difference
(I-J)
Lower Upper
Bound Bound
C1.1 Female Male -.082 .069 .236 -.218 .054
C1.2 Female Male -.100 .066 .128 -.229 .029
C1.3 Female Male .020 .060 .740 -.097 .137
C2.1 Female Male -.090 .071 .203 -.229 .049
C2.2 Female Male .001 .061 .993 -.119 .120
C2.3 Female Male .033 .059 .577 -.082 .148
C3.1 Female Male -.050 .068 .465 -.184 .084
C3.2 Female Male -.053 .069 .442 -.188 .082
C3.3 Female Male .050 .063 .427 -.073 .173
C4.1 Female Male -.050 .066 .449 -.181 .080
C4.2 Female Male .017 .064 .796 -.110 .143
C4.3 Female Male .031 .060 .602 -.087 .150
C5.1 Female Male .064 .041 .124 -.017 .145
C5.2 Female Male -.022 .040 .593 -.101 .058
C5.3 Female Male -.039 .040 .336 -.117 .040
C5.4 Female Male -.002 .037 .948 -.075 .070
D1.1 Female Male .047 .064 .462 -.079 .174
D1.2 Female Male .022 .067 .739 -.110 .154
D1.3 Female Male .095 .061 .117 -.024 .215
D2.1 Female Male -.133 .074 .071 -.279 .012
D2.2 Female Male -.037 .061 .543 -.158 .083
D2.3 Female Male .025 .059 .672 -.091 .140
D3.1 Female Male .014 .066 .836 -.116 .143
D3.2 Female Male -.024 .067 .716 -.156 .107
D3.3 Female Male -.023 .063 .712 -.148 .101
D4.1 Female Male -.113 .072 .116 -.254 .028
D4.2 Female Male -.086 .063 .173 -.209 .038
D4.3 Female Male -.033 .057 .564 -.146 .080
D5.1 Female Male .058 .043 .176 -.026 .142
D5.2 Female Male .041 .036 .253 -.029 .111
D5.3 Female Male -.014 .041 .743 -.094 .067
D5.4 Female Male -.072 .039 .063 -.148 .004
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

284
Source Dependent Type III Sum df Mean F Sig. Partial Eta Observed
Variable of Squares Square Squared Powerbm
Gender * A1.1 .192 1 .192 .727 .394 .002 .136
School A1.2 .364 1 .364 .512 .475 .001 .110
A1.3 .621 1 .621 1.197 .275 .003 .194
A2.1 8.728 1 8.728 19.357 .000* .045 .992
A2.2 .107 1 .107 .186 .666 .000 .072
A2.3 .669 1 .669 1.631 .202 .004 .247
A3.1 .226 1 .226 .506 .477 .001 .109
A3.2 .087 1 .087 .158 .692 .000 .068
A3.3 .041 1 .041 .097 .755 .000 .061
A4.1 .131 1 .131 .342 .559 .001 .090
A4.2 .115 1 .115 .227 .634 .001 .076
A4.3 .540 1 .540 1.497 .222 .004 .231
A5.1 .012 1 .012 .073 .788 .000 .058
A5.2 .079 1 .079 .474 .491 .001 .106
A5.3 .487 1 .487 3.540 .061 .008 .467
A5.4 .078 1 .078 .330 .566 .001 .088
B1.1 .304 1 .304 .545 .461 .001 .114
B1.2 .537 1 .537 .853 .356 .002 .152
B1.3 .002 1 .002 .004 .950 .000 .050
B2.1 .232 1 .232 .547 .460 .001 .114
B2.2 .845 1 .845 1.333 .249 .003 .210
B2.3 .667 1 .667 1.235 .267 .003 .198
B3.1 1.331 1 1.331 2.130 .145 .005 .308
B3.2 .065 1 .065 .155 .694 .000 .068
B3.3 .245 1 .245 .650 .421 .002 .127
B4.1 .446 1 .446 .681 .410 .002 .131
B4.2 .063 1 .063 .146 .703 .000 .067
B4.3 .103 1 .103 .242 .623 .001 .078
B5.1 .099 1 .099 .627 .429 .002 .124
B5.2 .246 1 .246 1.190 .276 .003 .193
B5.3 .064 1 .064 .330 .566 .001 .088
B5.4 .031 1 .031 .181 .671 .000 .071
Continued

Table 45. The gender * school effect

285
Table 45 continued
Source Dependent Type III Sum df Mean F Sig. Partial Eta Observed
bm
Variable of Squares Square Squared Power
Gender *
C1.1 .167 1 .167 .302 .583 .001 .085
School
C1.2 .109 1 .109 .207 .649 .000 .074
C1.3 .617 1 .617 1.456 .228 .003 .226
C2.1 .134 1 .134 .232 .630 .001 .077
C2.2 .276 1 .276 .605 .437 .001 .121
C2.3 .051 1 .051 .123 .726 .000 .064
C3.1 .122 1 .122 .225 .635 .001 .076
C3.2 .236 1 .236 .420 .517 .001 .099
C3.3 .049 1 .049 .106 .745 .000 .062
C4.1 1.691 1 1.691 3.327 .069 .008 .444
C4.2 .440 1 .440 .888 .347 .002 .156
C4.3 .008 1 .008 .017 .895 .000 .052
C5.1 .169 1 .169 .845 .359 .002 .150
C5.2 .007 1 .007 .036 .850 .000 .054
C5.3 .030 1 .030 .160 .689 .000 .068
C5.4 .050 1 .050 .330 .566 .001 .088
D1.1 .380 1 .380 .785 .376 .002 .143
D1.2 1.210 1 1.210 2.270 .133 .005 .324
D1.3 .027 1 .027 .060 .807 .000 .057
D2.1 2.543 1 2.543 3.891 .049* .009 .503
D2.2 .149 1 .149 .332 .565 .001 .089
D2.3 .012 1 .012 .028 .867 .000 .053
D3.1 1.215 1 1.215 2.377 .124 .006 .337
D3.2 .543 1 .543 1.041 .308 .003 .175
D3.3 .495 1 .495 1.050 .306 .003 .176
D4.1 3.509 1 3.509 5.801 .016* .014 .671
D4.2 2.508 1 2.508 5.417 .020* .013 .641
D4.3 .438 1 .438 1.094 .296 .003 .181
D5.1 .181 1 .181 .849 .357 .002 .151
D5.2 .024 1 .024 .164 .686 .000 .069
D5.3 .000 1 .000 .001 .969 .000 .050
D5.4 .088 1 .088 .497 .481 .001 .108
*. The mean difference is significant at the .05 level.

286
APPENDIX C. INTERVIEW RESULTS: PAIRWISE COMPARISON OF THE

RANKINGS OF ARGUMENTS IN EACH PROBLEM

287
(I) (J) Mean Std. Error Sig.a 95% Confidence Interval for
Argum Argum Difference (I-J) Differencea
ent ent Lower Bound Upper Bound
A1 A2 1.125 .639 .122 -.386 2.636
A3 .125 .666 .857 -1.451 1.701
A4 .750 .675 .303 -.846 2.346
A2 A1 -1.125 .639 .122 -2.636 .386
A3 -1.000 .500 .086 -2.182 .182
A4 -.375 .653 .584 -1.919 1.169
A3 A1 -.125 .666 .857 -1.701 1.451
A2 1.000 .500 .086 -.182 2.182
A4 .625 .625 .351 -.853 2.103
A4 A1 -.750 .675 .303 -2.346 .846
A2 .375 .653 .584 -1.169 1.919
A3 -.625 .625 .351 -2.103 .853
B1 B2 .125 .693 .862 -1.513 1.763
B3 -.125 .666 .857 -1.701 1.451
B4 .000 .598 1.000 -1.413 1.413
B2 B1 -.125 .693 .862 -1.763 1.513
B3 -.250 .881 .785 -2.334 1.834
B4 -.125 .766 .875 -1.937 1.687
B3 B1 .125 .666 .857 -1.451 1.701
B2 .250 .881 .785 -1.834 2.334
B4 .125 .441 .785 -.917 1.167
B4 B1 .000 .598 1.000 -1.413 1.413
B2 .125 .766 .875 -1.687 1.937
B3 -.125 .441 .785 -1.167 .917
continued

Table 46. Pairwise comparison of the rankings of arguments in each problem

288
Table 46 continued
a
(I) (J) Mean Std. Error Sig. 95% Confidence Interval for
a
Argum Argum Difference (I-J) Difference
ent ent Lower Bound Upper Bound
C1 C2 .000 .707 1.000 -1.672 1.672
C3 -.125 .581 .836 -1.498 1.248
C4 -.375 .653 .584 -1.919 1.169
C2 C1 .000 .707 1.000 -1.672 1.672
C3 -.125 .666 .857 -1.701 1.451
C4 -.375 .680 .598 -1.982 1.232
C3 C1 .125 .581 .836 -1.248 1.498
C2 .125 .666 .857 -1.451 1.701
C4 -.250 .796 .763 -2.133 1.633
C4 C1 .375 .653 .584 -1.169 1.919
C2 .375 .680 .598 -1.232 1.982
C3 .250 .796 .763 -1.633 2.133
D1 D2 -.625 .532 .279 -1.884 .634
D3 -.750 .648 .285 -2.282 .782
D4 -.625 .730 .420 -2.352 1.102
D2 D1 .625 .532 .279 -.634 1.884
D3 -.125 .666 .857 -1.701 1.451
D4 .000 .598 1.000 -1.413 1.413
D3 D1 .750 .648 .285 -.782 2.282
D2 .125 .666 .857 -1.451 1.701
D4 .125 .789 .879 -1.741 1.991
D4 D1 .625 .730 .420 -1.102 2.352
D2 .000 .598 1.000 -1.413 1.413
D3 -.125 .789 .879 -1.991 1.741
continued

289
Table 46 continued
a
(I) (J) Mean Std. Error Sig. 95% Confidence Interval for
a
Argum Argum Difference (I-J) Difference
ent ent Lower Bound Upper Bound
E1 E2 -.375 .730 .623 -2.102 1.352
E3 -1.000 .627 .155 -2.482 .482
E4 -.625 .653 .370 -2.169 .919
E2 E1 .375 .730 .623 -1.352 2.102
E3 -.625 .532 .279 -1.884 .634
E4 -.250 .726 .741 -1.966 1.466
E3 E1 1.000 .627 .155 -.482 2.482
E2 .625 .532 .279 -.634 1.884
E4 .375 .625 .567 -1.103 1.853
E4 E1 .625 .653 .370 -.919 2.169
E2 .250 .726 .741 -1.466 1.966
E3 -.375 .625 .567 -1.853 1.103
Based on estimated marginal means
a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no
adjustments).

290

You might also like