Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/2653699

The Calculator Project - Formal Reasoning about Programs

Article · March 1995


DOI: 10.1109/SEDC.1994.475332 · Source: CiteSeer

CITATIONS READS
5 2,277

5 authors, including:

Steve Reeves Richard Bornat


The University of Waikato Middlesex University, UK
150 PUBLICATIONS   921 CITATIONS    97 PUBLICATIONS   1,213 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Weak memory logic View project

All content following this page was uploaded by Steve Reeves on 02 November 2012.

The user has requested enhancement of the downloaded file.


To appear in IEEE Proceedings of SRIG-ET’94.

The Calculator Project - Formal Reasoning about Programs


Steve Reeves Doug Goldson Pat Fung Mike Hopkins
Tim O’Shea Richard Bornat

Department of Department of CITE Department of


Computer Science Computer Science The Open University Computer Science
University of Waikato Massey University Milton Keynes QMW, London
New Zealand New Zealand UK UK

Abstract experiments that we conducted; next we introduce and


discuss two of the calculators used during the project;
This paper describes The Calculator Project, which was a following that we present and discuss the results that we
three-year joint research project between the Centre for have, so far, obtained; finally we suggest where a follow-
Information Technology in Education at The Open on project might lead and suggest some improvements to
University, U.K. (Pat Fung, Tim O’Shea) and the our work.
Department of Computer Science, QMW, University of
London, U.K. (Richard Bornat, Doug Goldson, Mike
Hopkins, Steve Reeves). The project was funded by the 2: Calculators?
U.K. Joint Council Initiative in Cognitive Science and
Human-Computer Interaction. A calculator, for us, is a self-contained, robust, simple
The central aim of the project was to test the hypothesis to use software device which supports working in some
that providing so-called calculators would improve area analogous to the way the familiar, hand-held,
students’ performance in those parts of the undergraduate electronic calculator supports working in arithmetic. The
first-year that relied on formal reasoning skills. traditional calculator supports the users in arithmetic tasks
by relieving them of the burden of carrying out the
1: Introduction arithmetic operations, allowing them to concentrate on the
substance of the problem at hand. The aim of the
Computer Science undergraduates find “formal calculators developed or borrowed in this project was to
methods”, i.e. the use of mathematical notations and support learning to program in Miranda (“Miranda” is a
methods for supporting the development of computer trademark of Research Software Ltd.) and the
software and systems, difficult and in particular they find understanding and use of the language of first-order logic.
the part of formal methods that demands reasoning about An important design feature of a calculator should be
their programs the most difficult. that it can be used without a great deal of explanation and
The work reported in this paper set out to see what instruction. A second feature should be that it provides a
factors affected these difficulties and also to test the way for students to build-up a reliable model of the way in
hypothesis that providing so-called calculators would ease which a calculation takes place. They should come to
the burden that formal reasoning imposes. understand how it works and what rules it follows.
In order to gather these data the project team used many This means that we actually have two goals for our
different methods: questionnaires, interviews, test and calculators: they should support the student in the task of
exam results, logging of system use and monitoring of building an understanding of the activities that the
teacher and student discussion on electronic bulletin- calculator helps to carry out; they should support the
boards. student in the more mechanical and book-keeping parts of
As we shall see, this data gathering exercise allowed us their tasks, once learned properly, so that the tasks they
show some statistically significant results concerning the can tackle become progressively harder.
experience of students prior to enrolling on the courses. It When we first developed our ideas for this project we
also, equally valuably, produced much anecdotal evidence had also hoped that the calculators would be thought so
about how courses should be run. much fun to use that students would use them
The plan of this paper is as follows: in the next section spontaneously and perhaps play with them. This turned
we introduce and discuss the term “calculators”; we then out to be a very naïve idea on our behalf - our students
go on to introduce the main hypothesis of the were too busy and so oriented towards doing what they
experimental work of the project and the design of the must do to choose to do something viewed as extra work.
own MiraCalc [5] which supported the functional
programming course.
3: Project hypothesis and design In the following academic year the courses “Functional
Programming One” (FP1) and “Introduction to Logic”
In the project we had a simple hypothesis - providing (ItL) used these two calculators to support their teaching
calculators for first-year undergraduate students in via laboratory work and demonstrations during lectures.
Computer Science would improve their performance on The students were surveyed at the start of these courses
courses that involved formal reasoning, which students using questionnaires and interviews which were designed
typically find hard, to say the least [1,2]. on the basis of the results from OU1.
The project had two parallel and interconnecting These initial surveys gave us a good idea of the sorts of
streams of work: one (based at QMW) scoured the world students we were teaching and allowed the OU team to
for already existing calculators, evaluated them and, choose representative students according to several criteria
because for functional programming no suitable that we had hypothesized might make a difference to their
calculators existed, developed one of its own; the other achievement on the courses. These criteria included gender,
stream (based at the OU) carried out an educational level of achievement in maths, amount of programming
evaluation by comparing students who had used the experience and whether they used a computer at home.
calculators with those who had not - the comparison was These representative students were interviewed in some
done using a mixture of interviews, questionnaires, depth during and after the courses so that we could get a
consideration of exam, test and coursework results and very clear picture of how they had perceived the courses.
monitoring of the electronic bulletin boards that students We also gathered data from tests and exams to give
and tutors on the courses concerned used for day-to-day some quantitative information on their achievements on
communication about the courses. the courses. Further, we logged the amount of time that
The project had, broadly, six main phases and the Open each student used MiraCalc to see whether there was any
University (OU) and the QMW team worked in parallel significant difference between students who did and did not
most of the time. The phases, broadly, were: use that calculator.
We will see the results of this data gathering in section
OU1. Gather information about first-year five below.
undergraduates, design information gathering
techniques for future years based on these
results; 4: The Calculators

QMW1. Search for, or build, systems which would be Below we will work through parts of two simple
useful as calculators supporting functional examples to show how each calculator performs, but first
programming and formal reasoning; we make some general remarks. The calculators had to
have two main attributes: robustness and ease-of-use.
OU 2. Plan and carry out “before, during and after” They clearly had to be robust since in the environment
assessments of students on the functional that they were to be used in we could not afford to have
programming and logic courses; them breaking due to mistakes by students or the
inevitable experiments, enthusiastically conducted, whose
QMW2. Use the calculators in teaching; goals would be to find out how to break them.
They had to be easy to use in order that the calculators
OU3. Analysis of data gathered and writing-up of themselves did not become part of the problem that the
results; students were facing. They were intended to ease the
students’ burdens, not add to them. They had to have good
QMW3. Improvement of calculators based on feedback supporting documentation or help facilities and it had to
from OU2. be clear at any point that the students had control.

It became clear during QMW1 that while there were 4.1: Tarski’s World
many, many calculators which could be used to support
the logic course (of very varied usefulness, see [3]), there The main aims of this system are to introduce the
were none for the functional programming course. Hence, syntax of first-order logic (actually an interpreted first-
we had to build our own calculator to support the order logic with equality), to make the bridge to its
functional programming course, which used Miranda. semantics via the idea of an interpretation and to support
So, the outcome of these initial phases was the the teaching of semantic ideas like consistency,
decision to use Tarski’s World [4], developed by the inconsistency and entailment.
Center for the Study of Language and Information, Central to all this is the graphical representation of the
Stanford University, to support the logic course and our link between syntax and semantics, the interpretation.
Experience shows that students, when presented with the
usual formal definition of an interpretation (a pair of
functions, one mapping terms to objects in the universe
and the other mapping predicate letters to relations of the
appropriate arity), start to give up. In Tarski’s World very
natural use is made of a picture, the situation, to give
meaning to terms and predicates.
The situation displays a picture of the world which
contains some objects of certain shapes (cubes, tetrahedra,
dodecahedra) with certain attributes (names, sizes and their
shapes) and standing in certain relations to other objects
(larger than, in front of, between).
Figure one shows, in the top, left-hand window, a
picture of a “world”, the situation, in which the sentences
in the lower left-hand window are interpreted.
The student can decide whether or not a given sentence
is a well-formed formula or a sentence and then the system
can check for the well-formedness of formulae and
sentences - see the small box in the top right-hand corner
of the sentence window in figure one. This allows exercise
of the syntactic knowledge of the student and, by suitable
examples and questions, allows the support of teaching of
these notions. Figure One Tarski’s World
Having mastered some of the syntax of the language,
the student can then go on into learning about the
semantics. Given a sentence and a situation, the student support tools we looked at (around 15 of them), Tarski’s
can try to reason for themselves about the truth-value of World was amongst the best.
the sentence. Having reached some conclusion, the system
can used to check it. For simple examples it is usually
enough to be told you are wrong. In more complex cases 4.2: MiraCalc
the student can use the “game” to investigate why their
choice was wrong. MiraCalc aims simply to provide an environment in
The game poses a series of questions about the truth- which students can investigate the syntactic structure of
values of parts of the sentence concerned, gradually their scripts, find out more about the type of expressions
“homing in” on a misunderstanding, which is made very in the script and evaluate expressions in the environment
clear by using the situation to show why the truth-value formed by a script in a variety of ways: either fully, as in
was wrongly calculated by the student. Experience shows the Miranda system, in largish chunks (so-called “skips”)
that this is, not surprisingly, viewed as a much more and one step at a time (in the manner of Bird and Wadler’s
interesting way of calculating truth-values than going explanations in [6]).
through seemingly endless exercises using truth-tables. The step-by-step evaluation option turned out to be
Typical problems involve being given a situation and a highly favoured by the students since it allowed them to
list of sentences and being asked which sentences are true see exactly how evaluation in Miranda works and clearly
in the situation and being given a list of sentences and supported the task of working out why their programs
being asked to build a situation that makes them all true, were not working as expected.
if possible. This allows the ideas of consistency and Since for a functional language there are none of the
entailment, and their opposites, to be introduced and complications of, for example, global variables and side-
developed. effects that afflict other sorts of language, the mechanism
As ever with systems which have a graphical for doing step-by-step evaluation and, more importantly,
component at their heart, it is impossible to do justice to the rather small amount of information that has to be
Tarski’s World by words and “snapshots”. The reader is given to the user to enable them to make sense of what is
urged to try it and see. Suffice to say that of all the logic happening at each step, means that the calculator is
relatively simple, one of the main attributes that we
decided on for a successful system.
Figure Two MiraCalc on a Mac

Figure two shows, in the top window, a parsed script


and in the right-hand lower window the result of the first
step in the evaluation of numsfrom 5. The left-hand
lower window (actually a dialogue box) allows the user to Figure Three MiraCalc under X -
say how large a step should be taken next. working on scopes
In order to give a flavour of MiraCalc we briefly
consider three examples. (More detailed examples can be
found in [5] and [7], also note that there are two versions This example illustrates the fact that application
of MiraCalc: for a Mac and for an X terminal). associates to the left.
Once a script has been typed or read in and successfully Another feature of MiraCalc, which is relevant to any
parsed, the Expand and Contract operations can be programming language with control abstraction, is that it
used. They are meant to illustrate the structure of program gives support for the task of determining when an
expressions, the grammar of definitions, the associativity occurrence of a name is free, bound or binding in an
and precedence of operators, and the structure of curried environment. This is a part of the use of a programming
functions and functions defined using patterns. Expand is language that poses problems for many students, of
used to expand expressions, Contract Left and course, so it is important that a system like Miracalc
Contract Right to contract expressions. Back can provides support for learning about it.
be used to go back to previous states of the cursor. As an example of the sort of question that might be
Expand works as follows. It takes the current cursor posed consider the following: which binding occurrence of
selection in the script window and then expands this x binds the second occurrence of x in
selection to cover the smallest enclosing expression. For
example, using a box to represent the extent of f x = x + 1 where x = 1
highlighted text in a window, the effect of successive
expansions can be illustrated as and does f 2 have the value 2 or 3? To answer this
question we can experiment with MiraCalc. Assuming
times (double 2) (double 2) that we have entered the definition into the calculator we
select the second occurrence of x and check to see that it is
times (double 2) (double 2) bound by selecting the B o u n d option from the
Calculate menu. A message is displayed confirming
times (double 2) (double 2)
our guess as correct and, importantly for deciding the
times (double 2) (double 2) binding of this x, the cursor selection in the window is
moved to cover the binding occurrence of x. The window
will now look like the one in figure three.
through step by step and it will tell you. I mean it’s
As a final example of what can be done with MiraCalc, like having your own little teacher there saying well,
figure four shows, in the top pane of the window, a script this is how it does it which is very useful [8].
which contains definitions of various functions for
producing and manipulating various sorts of lists of
natural numbers. One example is 5: Results

numsfrom_inc x y = There is room here to present only a brief view of the


x : (numsfrom_inc (x + y) y) results of the project. In the following paragraphs we
paraphrase material from the fuller report [8].
The function numsfrom_inc gives, with its first There is no doubt that the use of the calculators helped
argument being x and its second y, as much of the list of students and that they had a beneficial effect on their
natural numbers starting learning of formal reasoning. Here we show why we can
conclude this and consider in which ways the tools have
[x,x + y,x + y + y,x + y + y + y...] helped and whom they have most helped.
For both MiraCalc and Tarski's World a selection of
as we care to calculate. assessment measures were used. Data in relation to
In the lower pane of figure four is shown the first five Tarski's World are qualitative, using student perceptions
steps in the stepwise evaluation of and responses to questionnaires to gauge its effect. This
data is strongly positive in showing the use that students
numsfrom_inc 5 1 have made of the tool and the benefits which they perceive
in using it. Replies from questionnaires were from almost
This feature turned out to be a heavily used, and very 65% of the target population. Of these, 90% had found it
popular, one. An example of what one student said during useful. Even more impressive is the 91% of those
interview gives the flavour of their response to stepwise students positively recommending that it should be used
evaluation: for the following student intake.
The benefits that they perceived from using Tarski's
- I mean to be honest without the calculator I think World were varied. It was used regularly, in that 63% of
everyone would be a hell of a lot more lost questionnaire respondents used it for all or for most of the
because it’s so much easier. You can just go in and if lab sessions allocated for its use. Data from interviews
you don’t understand how it reduces you can go showed that students enjoyed using it. A major benefit
was its appealingly simple, user-friendly interface
encouraging exploration:

- When you use the ‘game’ bit you can see the steps,
you can see what you've done wrong. It's a bit like
computer games
- If you don't understand the sentence, then you can
build it up bit by bit. It's easy to try things in
different ways
- It makes it very easy to experiment. You can try
out ideas about what the sentences mean. It's great,
it would be a pain trying out different things on
paper, having to keep erasing things, crossing
things out

Another benefit was that of being able to visualise the


more complex ideas to which students were introduced:

- Now we're doing quantifiers it's really useful


- Now we're going a bit more into depth with you
know...[quantifiers], it's useful for that now. It helps
make sense of it.
Figure Four Stepwise evaluation
difference between those with a low or very low level of
- If there is something with lots of ‘nots’ in, then I use MiraCalc use and those with an average or above level of
it, to keep check. use, but there were no significant differences between
- I got a sort of picture in my head, from Tarski's World, groups as the level of MiraCalc use rose from average to
'cos it's easy to picture and I work from there. It is very very high.
useful as a tool. It is good to be able to see everything. Other factors which might have had an effect on FP1
scores were also considered. Among these were
A most interesting finding from the data collected was mathematical backgrounds prior to university, previous
the absence of comments on the difficulties relating to experience of programming, previous home use of a
this area of the course. In a study of the previous intake in computer, students' motivations, expectations of the
QMW [9], a difficulty which students had noted in relation course. These factors and FP1 scores were examined using
to this introductory logic course was that of interpreting, Spearman's rank correlation coefficient as a measure of the
building and manipulating logical statements. The ease relationships between them. Only mathematical
with which students were able to do so using Tarski's background showed a significant correlation with FP1
World suggests that this difficulty was being addressed by scores.
the tool. From previous studies [9] we were aware that a
Data collected on the use of MiraCalc were both significant factor was likely to be whether students had
qualitative and quantitative. Analysis and interpretation of studied A level maths or not. (“A level maths” here refers
quantitative data indicates clearly that MiraCalc was of to a British qualification: “A levels”, i.e. “advanced
benefit to the students using the system and that its use levels”, are exams taking in the second-year of the sixth
had a positive effect on their learning of formal reasoning form at secondary school, which is seventh-form in New
methods. Qualitative data, collected from interviews and Zealand, which is the last possible year of a school
returns from questionnaires, indicate that where students education and so happens for most students to be when
had used it, they perceived the tool as useful and helping they are around 17-18 years old.) Further analysis of the
them in a number of areas. data confirmed the significance of A level maths. In
In addition to the qualitative data collected, quantitative relation to FP1 scores, a significant difference was shown
data were available to help assess the effect of using between the two groups of students, those who had studied
MiraCalc upon learning outcomes, i.e. records of the A level maths and those who had not (p < .003 Mann-
number and length of each student's MiraCalc sessions and Whitney).
background information given by students at the
beginning of the first semester. For the purposes of this
study, learning outcomes were represented by the marks Mean Score Mean Rank Cases
which students scored at the end of the year for the course 71.4 59.2 29 = 1 maths
Functional Programming 1. Since the frequency of scores 49.4 41.5 64 = 2 no maths
in FP1 did not satisfy normal distribution assumptions
the Mann-Whitney U -Wilcoxon Rank Sum W test was
used in the following data analyses. While it is very encouraging that the use of MiraCalc
For a number of the tests, notably those related to shows a positive effect in relation the end of year scores,
crosstabulation procedures, data relating to FP1 scores it is even more interesting to look at where the effect is
were ordered into the 25%, 50% and 75% quartiles, with most evident. We next considered different groupings of
corresponding score values of 26, 56 and 88, e.g. the top students, using levels of programming expertise and
quartile referring to the 25% of students whose end of year maths backgrounds in order to address this point.
score for FP1 was greater than 88. For tests relating to Perhaps surprisingly, there appeared to be no
MiraCalc use, non-users were classed as those who had significant relationship between the benefits of using
used the software tool for only an hour or less. In relation MiraCalc and levels of programming expertise in relation
to the level of use a scale of five was used, ranging from to FP1 scores.
very low to very high. Looking at the relationship between the use of
Initial analysis showed a significant difference, in MiraCalc, maths background and FP1 however was very
relation to FP1 scores, between those who had used enlightening. Here, in relation to FP1 we looked at two
MiraCalc in their first semester and those who had not (p groups, those students with a mathematical background,
≤ 0. 001 Mann-Whitney). The mean score of those using i.e. had studied maths at A levels, and those without a
MiraCalc was higher than that of those students who did maths background. Within each of these groups, we then
not use MiraCalc. Next we looked at the level of looked at whether or not they had used MiraCalc (figure
MiraCalc use in relation to FP1 scores. Here, there was a five).
Mean Score (FP1) Mean Rank Cases Student group
55.11 10.0 9 Maths not used MiraCalc
78.750 17.2 20 Maths used MiraCalc
29 total
(two-tailed) p < .03

Figure five Comparing advanced maths students

The results showed an interesting contrast. There is a committed resources to supporting labs for the logic
significant difference between those who had used the course is significant and their attitude towards the course
software tool and those who hadn't, in relation to FP1 end improved. According to some measure of relevance and
of year scores, in the group of students who had studied A importance, the students seemed more likely to view the
level maths. This difference is however is far more logic course as equal to the programming courses.
pronounced within the group of students who had not
studied A level maths (figure six).
Indications are then that the use of the software tool 6: Further work
MiraCalc has had a positive effect upon the learning
outcomes and that this effect is most strongly seen in the Since this project finished, various of its members
results of those students who had not studied A level have been considering how to continue work in this area,
maths. Given that one of the principal motivations for building on the achievements outlined above. Firstly, it
developing and introducing these software tools had been should be noted that use of the calculators is continuing at
to address difficulties which 'non' mathematical students QMW, and they are also now being used at Waikato,
experience, this is a most encouraging result. where the developments mentioned below are taking place.
A final observation, based purely on anecdotal evidence In one main development, work is now underway to
from teaching staff, was that students treat a ‘theoretical’ improve MiraCalc. The Mac version (as shown in figure
course like Introduction to Logic more seriously when it one above) does not allow use of all of Miranda, though
is supported with labs just as the programming courses enough of the language is supported to allow a first course
were. to use it successfully. This is because of the relatively
What appears to be happening is that the Department is short amount of time we had to put into developing the
influencing the students’ perceptions about the relevance calculator between the start of the project and the use of
of logic. Without the labs, which is how such courses are MiraCalc in the functional programming course.
often taught, the students see the courses as more During the final part of the project, after teaching had
peripheral to Computer Science and hence less important, finished, we were able to extend MiraCalc to include all of
so they treat the course less seriously. Typically this the language in the form of a Unix-based version which
manifests itself in students leaving the logic coursework has an X interface. We decided to follow this route so that
until last when there is competition from other courses labs that were X workstation-based could also use
and being less willing to devote time to independent study MiraCalc, and Mac-based labs could use the full system
of the course. (It also has to be said that some teaching via MacX.
staff are guilty of reinforcing this problem by what they Even though MiraCalc implements all of Miranda it is
say about such courses and how they schedule not truly lazy in that although it uses normal-order
consideration of them during tutorials.) evaluation it does not use sharing. We intend, in future
In the students’ eyes, the fact that the Department work, to consider how to include the idea of sharing in the

Mean Score (FP1) Mean Rank Cases Student group


21.143 17.07 14 No maths not used MiraCalc
57.36 36.82 50 No maths used MiraCalc
64 total
(two-tailed) p < .001

Figure six Comparing non-advanced maths students


calculator, though this will pose some problems for the science students find formal reasoning frightening.
step-by-step feature since it will require that we show how Appearing in December '94 issue. Journal of
a graph-reduction evaluation mechanism works, which is Computer Assisted Learning.
likely to be much harder to show in a way which does not
confuse students too much. [3] Goldson, D., Reeves, S., Bornat R. (1993) A
Using many of the lessons learnt on the project Review of several systems for the Support of
reported here, another calculator is being developed to Logics in The Computer Journal, vol. 36, no. 4,
support working on program derivation within pp. 373-386.
constructive type-theory [10]. This is a single language
(syntax, semantics and proof theory) within which formal [4] Barwise, J. and Etchemendy, J. (1991) T h e
specifications (very expressive types) can be written and Language of First-Order Logic, CSLI, Stanford
functional programs can be derived (according to the proof University.
rules) so the programs necessarily provably meet their
specifications. This work is aimed at graduate-level [5] Goldson, D. (1994) A Symbolic Calculator for
students and the calculator as so far constructed Non-Strict Functional Programs in The Computer
(PICTCalc) has been used by several classes of M.Sc. Journal, vol. 37, no. 3, pp. 177-187.
students enrolled on the course “Constructive Mathematics
and Programming” at QMW. [6] Bird, R. and Wadler. P (1988) Introduction to
Functional Programming, Prentice-Hall.
7: Acknowledgments
[7] Goldson, D., Hopkins, M. and Reeves, S. (1994)
As ever with educational research, we have to thank the MiraCalc: The Miranda Calculator, The Unix
students, who suffered all of the questionnaires, interviews Version. Working paper 94/5, Department of
and novel software associated with this work with Computer Science, University of Waikato, New
excellent humour, and our colleagues at QMW on the Zealand.
teaching staff. Without the help and goodwill of both
these groups the work reported here would not have been [8] Fung, P. and O’Shea, T. (1994) Using Software
possible. Tools to Learn Formal Reasoning: A first
assessment CITE Technical Report no. 197 Open
University, Milton Keynes, UK.
References
[9] Fung, P. and O’Shea, T. (1993) Learning to reason
[1] Fung, P., O’Shea, T., Goldson, D., Reeves, S. and formally about programs: an observational study of
Bornat, R. (1992) Computer Science Students' computer science students. CITE Technical Report
Perceptions of Learning Formal Reasoning no. 168. Open University, Milton Keynes, UK.
Methods in International Journal of Mathematical
Education in Science and Technology, pp 749-760. [10] Reeves, S. (1994) Computer support for students'
work in a formal system: MacPICT, to appear in
[2] Fung, P., O’Shea, T., Goldson, D., Reeves, S. and International Journal of Mathematical Education in
Bornat, R. (1994) Understanding why computer Science and Technology.

View publication stats

You might also like