Professional Documents
Culture Documents
Compiler Construction As An Effective Application To Teach Object-Oriented Programming
Compiler Construction As An Effective Application To Teach Object-Oriented Programming
Compiler Construction As An Effective Application To Teach Object-Oriented Programming
1
Program Parse Bind Type Rename
Dereify
Bounds
IR Translate Desugar Inline
Checking
A compiler typically features three parts: the front-end analyzes the input program, the middle-end
is independent of the source and target languages, and the back-end generates the output program
in the target language. This figure focuses on the front-end.
First, the source is parsed, resulting in an Abstract Syntax Tree (AST). The AST is the main data
structure of the front-end; it is analyzed/annotated/modified by the various modules. Then the bind
task implements scoped name analysis: it matches identifier uses with their declaration. The types
are checked. Renaming all the identifiers so that they are unique simplifies later transformations.
Object Oriented (OO) structures are translated into object-less equivalent constructs at the dereify
stage. Function inlining is an optimization, desugaring translates high-level constructs into more
primitive ones, and bound-checking instruments the code to catch out-of-bounds array accesses.
Finally, the front-end translates into an Intermediate Representation (IR), passed to the middle-end.
Figure 1: Crash course on the Tiger compiler front end.
2
2.1.2 Translation To Intermediate Rep-
Visitor
resentation
The final stage of the front end, Translator,
translates the AST into some IR used by the PrettyPrinter DefaultVisitor ...
middle end. It exhibits an interesting problem
which is very cleanly handled by OOP. The
translation of some constructs depends on their
Binder Typer Translator ...
use. For instance a < b used as a statement can
be simply discarded (think of ‘(void) a < b;’
in C), used as an expression it should be nor-
malized to 0 or 1, and used as a conditional in OverloadTyper
an if statement, it naturally translates into a
conditional branching instruction. Traditional Figure 3: Visitor hierarchy.
traversals of the AST cannot cope nicely with
the way a node is used : it requires that the pro- All the stages in the front end apply on the
cessing of a child knows the type of its parent AST. Instead of continuously editing the AST
(and in some cases, of its ancestors!). classes, the Visitor design pattern is used.
Appel [1] suggests a beautiful OOP im- This pattern is a convenient implementation
plementation: translation nutshells. Instead of traversals that decouples the algorithm (the
of returning the completed translation of a traversal) from the data (the AST). The rele-
node, convert each AST construct into a proto- vance of this pattern is perfectly well understood
translation, a nutshell whose type depends on its by the students because they deliver milestones
actual nature. There are three proto-translation of their compiler: at each new stage we provide
types: Expression (values involved in com- new components (e.g., visitors with gaps) which
putations), Statement (expressions whose val- they simply add to their project. Using classi-
ues are discarded, loops, etc.), and Comparison cal virtual methods instead or Visitors would
(expressions well-suited for branching). Then prevent modularity, since they require students
a parent completes the translation of a child to alter the base classes, that they may already
by calling a method that depends on its have adjusted to their needs, to add new behav-
need: as statement, as expression, and iors.
as condition. Common parts in the visitors, starting with
2.1.3 Language Extensions the traversal itself, provide a nice impetus for
more OOP: introduce a visitor hierarchy to fac-
We submit challenging optional tasks to keep tor common parts (see Figure 3).
skillful students interested. Language exten- The whole compiler is “customizable”, sup-
sions (overloading support, additional primi- porting 60 modules and a large variety of dif-
tive types, etc.) are a good topic, but they re- ferent configurations, including optional assign-
quire that base stages be extended to support ments (see Figure 1). The explicit sequencing
the new constructs. This is cleanly handled between the different tasks would be cumber-
by OOP: sub-class the visitors (e.g., derive an some to write, a maintenance nightmare. The
OverloadTyper from the Typer to compute the different stages of the compiler are therefore ex-
types of overloaded routines). pressed as Task objects that declare their de-
2.2 Architecture and Design Pat- pendencies (dashed arrows in Figure 1) to the
terns scheduler which computes the effective sequence
(solid arrows). This is an instance of the Com-
Although compilers were amongst the most mand design pattern.
complex pieces of software 50 years ago, today Other design patterns fit compilers, too
the bird-eye view of their architecture is usually many to enumerate. Of course Singletons
surprisingly simple: a sequence of transforma- are. . . numerous (e.g., memory pool). The Fly-
tions2 (or tasks). weight pattern factors occurrences of an iden-
2 These transformations can be complex and in the tifier to a single one. Not only does it save space,
case of optimizing compilers their scheduling is complex; but it also dramatically speeds up the handling
this is not the case in a pedagogic compiler. of identifiers: string operations (hashing, equal-
3
ity, order) become single CPU instructions on why support from the compiler, rather than a li-
pointers. brary, is a better option.
2.3 Object-Oriented Source Lan- 2.4 Object-Oriented Implementa-
guage tion Language
Students get special insight with OOP when im- Finally, as a piece of software, the compiler must
plementing an OOL. be written in some programming language. Ob-
viously, since there are many opportunities for
2.3.1 Understanding the Semantics object-oriented design in a compiler (see Sec-
In our experience, most students quickly under- tion 2.1 and 2.2), an OOL is well-suited. But
stand what OOP is: encapsulation, data hiding, which one? We chose C
++ over more recent lan-
inheritance and polymorphism. They also are guages like Java or C# for a number of reasons.
quickly able to make profit of object-orientation
in most situations. Yet, most lack a precise and Rigorous and Demanding As Stroustrup
formal view of what an OOL is. Students who [11] puts it himself “C makes it easy to shoot
read language standards are rare (and should be yourself in the foot; C++ makes it harder, but
hired immediately), usually they content them- when you do it blows your whole leg off”. As
selves from what the courses and the course any sharp tool, C++ can provide excellent re-
book provide: more of a tutorial than a refer- sults when mastered, or return itself against a
ence guide. lazy programmer. This is not the case of some
Studying a simple language, with well de- “too comfortable languages” such as Java [7].
fined semantics, allows the teacher to focus on
the key components of OOL, to emphasize the Multi-paradigm C++ supports a wide spec-
dark corners (such as covariance vs. contravari- trum of programming styles. For instance,
ance issues), or even to highlight design choices we teach different memory management tech-
made invisible by the syntax (the dot notation, niques: Resource Acquisition Is Initialization
‘obj.fun(arg)’, hides the fact that polymor- (RAII) (a C++ idiom), flyweight pool, refer-
phism is usually restrained to the first argument, ence counting. This is impossible in garbage-
‘obj’). collected languages. Generic Programming
Believing that we understand specifications is (GP) is heavily used: generic containers and
one thing, implementing them is another. generic algorithms are numerous. Parts use
Functional Programming (FP), and even meta-
2.3.2 Understanding the Dynamics
programming: programs run at compile-time [4,
The best means to feel the “Ah ah!” effect on Goals Sections].
OOP is to implement it.
Sticking to its guideline, the Tiger project Existing Libraries C++ often looks poor
makes sure that students’ attention is not dis- compared to other OOL when it comes to its
rupted from the topic, OOP, by devoting a well standard library, STL [10]. For instance it does
defined and isolated task to it: “de-reification”. not even provide hash tables! But using stock
This task transforms an object-oriented Tiger C++ today would be weird: knowing and us-
program into a plain Tiger project. Because it ing Boost [12] is part of the C++ curriculum.
works on the high-level language instead of some Boost libraries are excellent material for teach-
low-level machine code, students of every skill ing OOP, GP, and FP: these peer-reviewed
can understand it. Because no other process- libraries enlighten good C++ practices, design
ing is performed, the attention is fully devoted patterns, code factoring and reusing, as well as
to the mechanism of OOP. In addition, it is portability.
the opportunity to present different implemen-
tation of OOP, based of vtables as in C++, or 3 Course and Project
on dispatching functions [13].
Thanks to this insight students can bring Managing such a project, spanning three to five
OOP into non-object-oriented languages (when months, with more than two hundred and fifty
they are not given the choice on the implementa- students per class, i.e., from about four hun-
tion language). Yet they are still able to explain dred compiler deliveries (the project is made in
4
groups of 4 students, with 6 deliveries), is a chal- the students packages, to run the test suites, to
lenge in itself. generate the documentation samples, to gener-
ate the AST, to produce the code delivered to
Organization The Tiger team is composed of student etc. Many of these tools are free soft-
(i) one teacher who manages the project, writes ware, available from the project’s Web site [9].
the assignments and examinations, maintains
the reference compiler, (ii) some co-maintainers 4 Discussion
who contribute mandatory or optional parts
of the compiler, (iii) about twenty assistants, We conducted a survey at the end of the year
students of the previous class willing to assist among 204 students, to ask them which no-
younger students. They are also in charge of tions they thought were best taught by the com-
oral examinations. piler assignment. The item with the best mark
We use Web sites to publish information, (3.58/5) was “understanding how programming
mailing lists to coordinate ourselves, and pub- languages work”, followed by “object-oriented
lic newsgroups for asynchronous discussion with modeling and design patterns” (3.18/5) and
the students. A 200+ page document [4] details “C++” (3.07/5), which all three are actually
all the assignments and makes explicit what stu- the main goals aimed at by the project. How-
dents are expected to learn. It also proposes ever, we were surprised to find that program-
ideas of optional extensions for the most enthu- ming languages were felt as better taught than
siastic students. C++, as we put the emphasis on the latter item
rather than on the former during courses and
The project starts after students have com-
the project. We think that using a more com-
pleted a “Languages, Automata, Grammars,
plex OO source language to be compiled ac-
Parsing” course. To learn the language they
tually helped the students to apply what they
first have to write a significant program in Tiger.
had learned, and better understand how object-
They will use this assignment, together with
oriented languages work. This idea is indeed
other provided big programs, to exercise their
confirmed by the fact that students seemed to
compilers. Almost concurrently they start writ-
have overcome most of the difficulties intro-
ing the scanner and parser. At the same period
duced by the OO constructs in the front-end:
C++ lectures start.
according to the survey, the parts considered as
the less difficult ones were “name binding” and
One Milestone Assignment and support “type-checking” – the ones that were actually
code for a stage are typically published on a the most affected by the OO extension.
Monday, and the delivery is two to three weeks Over the years the project became easier, and
afterward. In the meanwhile, the assistants are less ambitious [3]. Infrastructure is also pro-
available at almost any time. On the Friday at vided for highly skilled students. As an unex-
noon, students must have uploaded their pack- pected result, proficient students had lost some
age on the Web site. It launches the automated interest in the project: they felt the pleasure of
testing procedure, which evaluates every stage writing a compiler was stolen from them. To
of the compiler. It computes a global correction meet their expectations, each stage now offers a
grade, giving higher penalties to errors in ear- set of options. They range from easy to ambi-
lier phases. The result is published to the mem- tious and provide more insight into CC (which is
bers of the group who may decide to “re-upload” not the purpose of the core assignment). Thanks
within two days, which causes a 10% penalty. to this infrastructure, new refinements were pro-
The following week the oral examination is or- posed, and more students implemented options.
ganized. One assistant spends about 45mn with As the project is long and teaches many sub-
a group, asking questions, evaluating answers, jects, it is usually perceived by students as hav-
browsing the code, and helping students to un- ing a strong impact on their schooling. During
derstand and overcome their problems. yearly debriefings, many of them report that
the project was demanding. However we have
Tools In such a large environment, automa- received good reports from alumni each year
tion is salvation. Many auxiliary tools were about the project, sometimes several years af-
written [6] to ensure via a daily build that the ter their graduation! Frequent remarks include
reference compiler suffers no regression, to check the actual effectiveness of the project at teach-
5
ing design patterns, and that they are currently [4] Akim Demaille and Roland Levillain.
using them in their daily work. Many of them The Tiger Compiler Project Assignment.
report that they have gained a status of expert EPITA Research and Development Lab-
w.r.t. C++, design, development process and oratory (LRDE), 14-16 rue Voltaire, FR-
tools among their team. And sometimes they 94270 Le Kremlin-Bicêtre, France, 2007.
http://www.lrde.epita.fr/~akim/ccmp/
even admit they reused a technique from the
assignments.pdf.
Tiger Compiler in their current project, almost
as-is! Though informal, we rate these “success [5] Akim Demaille and Roland Levillain. The
stories” as a good indicators of the relevance of Tiger Compiler Reference Manual. EPITA Re-
the project at teaching modern and accurate OO search and Development Laboratory (LRDE),
and C++ design and programming. 14-16 rue Voltaire, FR-94270 Le Kremlin-
Bicêtre, France, 2007. http://www.lrde.
5 Conclusion epita.fr/~akim/ccmp/tiger.pdf.