Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

The Python Compiler for CMU Common Lisp

Robert A. MacLachlan
Carnegie Mellon University

Abstract An unusual amount of effort was devoted to user-


interface asp ects of compilation, even in comp arisen
The Python compiler for CMU Common Lisp has to commercial products. One of the theses of this
been under development for over five years, and now paper is that many of Lisp’s problems are best seen
forms the core of a production quality public domain as user-interface problems.
Lisp implementation. Python synthesizes the good In a large language like Common Lisp, efficient
ideas from Lisp compilers and source transforma- compiler algorithms and clever implementation can
tion systems with mainstream optimization and re- still be defeated in the details. In 1984, Brooks and
targetability techniques. Novel features include strict Gabriel[5] wrote of Common Lisp that:
type checking and source-level debugging of compiled
code. Unusual attention has been paid to the com- Too many costs of the language were dis-
piler’s user interface. missed with the admonition that “any good
compiler” can take care of them. No one
has yet written nor is likely without tremen-
1 Design Overview dous effort a compiler that does a fraction
of the tricks expected of it. At present, the
CMU Common Lisp is a public domain implementa- “hope” for all Common Lisp implementors
tion of Common Lisp. The intended application of is that an outstandingly portable Common
this implementation is as a high-end Lisp develop- Lisp compiler will be written and widely
ment environment. Rather than maximum peak per- used.
formance or low memory use, the main design goals
are the ease of attaining reasonable performance, of Python has a lot of operation-specific knowledge:
debugging, and of developing programs at a high level there are over a thousand special-case rules for 640
of abstraction. different operations, 14 flavors of function call and

CMU Common Lisp is portable, but does not sacri- three representations for multiple values. The + func-

fice efficiency or functionality for ease of retargeting. tion has 6 transforms, 8 code generators and a type

It currently runs on three processors: MIPS R3000, inference method. This functionality is not without

SPARC, and IBM RT PC. Python has 50,000 lines of a cost; Python is bigger and slower than commercial

machine-independent code. Each back-end is 8,000 CL compilers.

lines of code (usually derived from an existing back-


end, not written from scratch.) Porting takes 2–4
wizard-months. 2 Type Checking

Permission to copy without fee all or part of this material is


Python brings strong typing to Common Lisp. Type
granted provided that the copies are not made or distributed for declarations are optional in Common Lisp, but
direct commercial advantage, the ACM copyright notice and the Python’s checking of any type assertions that do ap-
title of the publication and its date appear, and notice is given pear is both precise and eager:
that copying is by permission of the Association for Computing
Machinery. To copy otherwise, or to republish, requires a fee
Precise -- All type assertions are tested using the
and/or specific permission.
1992 ACM LISP & F. P.-6 I92ICA
exact type asserted. All type aaeertions (both
o 1992 ACM O-89791 -483 -X1921000610235 . ..$l .50 impli{ it and declared) are treated equally.

235
Eager — Run-time type checks are done at the loca- In: defun type-error
tion of the first type assertion, rather than being (addf x 3)
delayed until the value is actually used. –> +
——
——
Explicit type declarations are neither trusted nor (tie single-float 3)
ignored; declarations simply provide additional asser- Warning: This is not a single-float:
tions which the type checker verifies (with a run-time 3
test if necessary) and then exploits. This eager type
checking allows the compiler to use specialized vari-
able representations in safe code. In addition to detecting static type errors, flow
Because declared types are precisely checked, dec- analysis detects simple dynamic type errors like:
larations become more than an efficiency hack. They
provide a mechanism for introducing moderately
complex consistency assertions into code. Type (defun foo (x arg)
declarations are more powerful than ass ert and (ecase x
check-type because they make it easier for the com- (:regist er-number
piler to propagate and verify assertions. (and (integerp arg) (>= x O)))

.))

2.1 Run-Time Errors


In: defun foo
Debuggers for other languages have caught up to (>= x o)
Lisp, but Lisp programs often remain easier to de- –> if <
bug because run-time errors are signalled sooner af- ——
——>
ter the inconsistency arises and contain more useful x
i nformat ion. Warning: This is not a (or float rational):
Debugger capabilities are important, but the time :register-number
that the debugger is invoked is also very important.
There is only so much a debugger can do after a pro-
gram has already crashed and burned. Strict type Johnson[8] shows both the desirability and feasibil-
checking improves the quality and timeliness of run- ity of compile-time type checking as a way to reduce
time error messages. the brittleness (or increase the robustness) of Lisp
The efficiency penalty of this run-time type check- programs developed using an exploratory methodol-
ing would be prohibitive if Python did not both min- ogy.
imize run-time checks through compile-time type in- Python does less interprocedural type inference, so
ference and apply optimizations to minimize the cost compile-time type checking serves mainly to reduce
of the residual type checks. the number of compile-debug cycles spent fixing triv-
ial type errors. Not all (or even most) programs can
be statically checked, so run-time errors are still pos-
2.2 Compile-Time Type Errors
sible. Some errors can be detected without testing,
In addition to proving that a type check is unneces- but testing is always necessary.
sary, type inference may also prove that the asserted Once th- basic subtypep operation has been im-
and derived types are inconsistent. In the latter case, plemented, it can be used to detect any inconsisten-
the code fragment can never be executed without a cies during the process of type inference, minimizing
type error, so a compile-time warning is appropriate. the difficulty of implementing compile-time type er-
Compiling this example: ror checking. The importance of static type checking
may be a subject of debate, but many consider it a
requirement for developing reliable software. Perhaps
(defmacro addf (x n)
compile-time type errors can be justified on these un-
‘(+ ,x (the single-float ,n)))
quantifiable grounds alone.

(defun type-error (x)


(addf x 3)) 2.3 Style Implications

Gives thts compiler warntng: Precise, eager type checking lends itself to a new cod-
ing style, v-here:

236
Declarations are specified to be as precise as
possible: (integer 3 7) instead of f ixnum, which give;? these eficiency notes:
(or node null ) instead of no declaration.
In: defun eff-note
Declarations are written during initial coding
(+ixy)
rather than being postponed until tuning, since ==
declarations now enhance debuggability rather
~~~ (+ i X) Y)
than hurting it.

Forced to do inline (signed-byte 32) arithmetic (cost 3).


Declarations are used with more confidence,
Unable to do inline fixnum arithmetic (cost 2) because:
since their correctness is tested during debug-
The first argument is a (integer -1073741824
ging.
1073741822),
not a fixnum.

Accountability to Users (arefs (+ i x y))


–> let*
isp implementation hides a great deal of com- ==
plexit y from the programmer. This makes it eas- (~ernel:data-vector-ref array kernel: index)
ier to write programs, but harder to attain good Note:
efficiency. [7] Unlike in simpler languages such aa C, Forced to do full call.
there is not a straightforward eficiency model which Unable to do inline array access (cost 5) because:
programmers can use to anticipate program perfor- The first argument is a base-string,
mance. not a simple-base-string.
In Common Lisp, the cost of generic arithmetic and
other conceptually simple operations varies widely de-
pending on both the actual operand types and on the
amount of compile-time type information. Any accu- 4 Source-Level Debugging
rate efficiency model would be so complex as to place
unrealistic demands on programmers -– the work- In addition to compiler messages, the other major
user interface that Lisp implementations offer to the
ings of the compiler are (rightly) totally mysterious
programmer is the debugger. Historically, debugging
to most programmers.
was done in interpreted code, and Lisp debuggers of-
A solution to this problem is to make the compiler
fered excellent functionality when compared to other
accountable for its actions[15] by establishing a dialog
with the programmer. This can be thought of as pro- programming environments. This was more due to

viding an automated efficiency model. In a compiler, the use of a dynamic-binding interpreter than to any
great complexity of the debuggers themselves.
accountability means that efficiency notes:
There has been a trend toward compiling Lisp
are needed whenever an inefficient implementa- programs during development and debugging. The
tion is chosen for non-obvious reasons, first-order reason for this is that in modern imple-
ment ations, compiled code is so much faster than
should convey the seriousness of the situation, the interpreter that interpreting complex programs
has become impractical. Unfortunately, compiled de-
must explain precisely what part of the program
bugging makes interpreter-based debugging tools like
is being referred to, and
*evalhoo~* steppers much less available. It is usu-

must explain why the inefficient choice was made ally not e+en possible to determine a more precise
(and how to enable efficient compilation.) error location than “somewhere in this function.”
One solution is to provide source-level debugging of
compiled t ode. Although the idea has been around
Efficiency Note Example
for a while l18], source level debugging is not yet avail-
Consider this example of “not good enough” declara- able in the popular stock hardware implementations.
tions: To provide useful source-level debugging, CMU Com-
mon Lisp had to overcome these implementation dif-
ficulties:
(defun eff-note (s i x y)
(declare (string s) (fixnum i x y)) ● Excessive space and compile-time penalty for
(arefs (+ i x y))) dumpmg debug information.

237
Excessive run-time performance penalty because #s(my-struct slot 2)
debuggability inhibits important optimizations #p’’foo” )
like register allocation and untagged representa-
tion. ; evaluation of local variables in expressions, and
6] (my-struct-slot (cadr structs))
Poor correctness guarantees. Incorrect informa- 2
tion is worse tha~ none, and even very rare in-
correctness destroys user confidence in the tool. ; display of the error location.
6] source 1
The Python symbolic debugger information is com-
pact, allows debugging of optimized cc)de and pro-
(oddp (#:=* *here*** (my-struct-slot struct)))
vides precise semantics. Source-based debugging is
supported by a bidirectional mapping between loca-
; and you an jump to editing the exact error locatton.
tions in source code and object code. Due to careful
6] edit
encoding, the space increase for source-level debug-
ging is less than 35’%0. Precise variable liveness infor-
mation is recorded, so potentially incorrect values are
not displayed, giving what Zellweger[22] calls truthful
4.2 The Debugger Toolkit
behavior.
The implementation details of stack parsing, debug
info and code representation are encapsulated by an
4.1 Debugger Example
interface exported from the debug-internals pack-
This program demonstrates some simple source-level age. This makes it easy to layer multiple debug-
debugging features: ger user interfaces onto CMU CL. The interface is
somewhat similar to Zurawski[23]. Unlike SmallTalk,
Common l,isp does not define a standard representa-
(defstruct my-struct tion of activation records, etc., so the toolkit must be
(slot nil :type (or integer null))) defined as a language extension.

(defun source-demo (structs)


(dolist (struct structs) 5 Efficient Abstraction
(when (oddp (my-struct-slot strucu))
(return struct)))) Powerful 1 lgh-level abstractions help make program
maintenan -e and incremental design/development
(source-demo easier[20, ‘)1]; this is why Common Lisp has such a
(list (make-my-struct :slot O) rich variety of abstraction tools:
(make-my-struct :slot 2)
(pathname “foo’’)))
● Global and local functions, first-class functional
values,

Resulting in a run-time error and debugger session:


● Dynamic typing (ad-hoc polymorphism), generic
functions and inheritance,
Type-error in source-demo:
#p’’foo” is not of type my-struct ● Macrm, embedded languages, and other lan-
guage extensions.
Debug (type H for help)
Unfortu lately, inefficiency often forces program-
~ The usual call frame printing, mers avoid appropriate abstraction. When the dif-
~source-demo (#s(my-struct slot 0) ficulty of efficient abstraction leads programmers to
#s(my-struct slot 2) adopt a low-level programming style, they have for-
#p’’foo” )) feited the ~.dvantages of using a higher-level language
like Lisp In order to produce a high-quality Lisp
. But also local variable display by name, developme lt environment, it is not sufficient for the
i] 1 C-equivaletlt subset of Lisp to be as efficient as C —
struct = #p’’foo” the higher level programming tools characteristic of
structs = (#s(my-struct slot O) Lisp must also be efficiently implemented.

238
Python increases the efficiency of abstraction in (declaim (inline find-in))
several ways: (defun find-in (next element list &key (key #’identity)
(test #’eql test-p)
Extensive partial evaluation at the Lisp seman-
(test-not nil not-p))
tics level,
(when (and test-p not-p)

Efficient compilation of the full range of Common (error “Both :Test and :Test-Not supplied.”))

Lisp features: local functions, closures, multiple (if not-p

values, arrays, numbers, structures, and (do ((current list (funcall next current)))
((null current) nil)
Block compilation and inline expansion (syner- (unless (funcall test-not (funcall key current)
gistic with partial evaluation and untagged rep- element )
resent ations.) (return current)))
(do ((current list (funcall next current)))
Partial Evaluation ((null current) nil)
[whet (funcall test (funcall key current) element)
Partial evaluation is a generalization of constant
(return current)) j))
folding[9]. Instead of only replacing completely con-
stant expressions with the compile-time result of the
When inline expanded and partially evaluated,
expression, partial evaluation transforms arbitrary
this transformation results:
code using information on parts of the program that
are constant. Here are a few sample transformations:
(find-in #’foo-next 3 foo-list

● Operation is constant: :key #’foo-offset :test #’<)

(funcall #’eql x y) ~ (eql x y) +-


(do ((current foo-list (foo-next current)))
● Variables whose bindings are invariant: ((null current) nil)
(let ((x (+ a b))) x) (when (< (foo-offset current) 3)
=+ (return current)))
(+ a b)

● Unused expression deletion:


(+ (progn (* a b) c) d) 5.3 Meta-Programming and Partial
+
Evaluation
(+ c d)
Mets-prog.:amming is writing programs by develop-
● Conditionals that are constant:
ing domain-specific extensions to the language. It is
(iftxy)+x
an extremely powerful abstraction mechanism that
(if (consp a-cons) x y) * x
Lisp makes comparatively easy to use. However,
(ifp(if pxy)z)+(ifpxz)
macro writing is still fairly difficult to master, and
The canonicalization of control flow and environment efficient macro writing is tedious because the expan-
during ICR Conversion aids partial evaluation by sion must be hand-optimized.
making the value flow apparent. Partial evaluation makes meta-programming easier

Source transformation systems have demonstrated by making inline functions efficient enough so that
that partial evaluation can make abstraction efficient macros can be reserved for cases where functional ab-
by eliminating unnecessary generality[15, 4], but only straction is insufficient. Such optimization also makes
a few non-experimental compilers (such as 0rbit[9]) it simpler to write macros because unnecessary condi-
have made comprehensive use of this technique. Par- tionals and variable bindings can be introduced into
tial evaluation is especially effective in combination the expam ion without hurting eficiency.
with inline expansion[19, 13].
5.4 Block Compilation
5.2 Partial Evaluation Example
Block compilation assumes that global functions will

Consider this analog of find which searches in a not be red fined, allowing the compiler to see across

linked list threaded by a specified successor function: function b Jundaries[18].


In CMIJ Common Lisp, full source-level debugging
is still su~ ported in block-compiled functions. The

239
only effect that block compilation has on debuggabil- ● Efficiency notes provide reliable feedback about
ity is that the entire block must be recompiled when numeric operations that were not compiled effi-
any function in it is redefined. Often block compila- ciently.
tion is first used during tuning, so full-block recom-
In addition to open-coding operations on single and
pilation is rarely necessary during development.
double f losts and f ixnums, Python also implements
Block compilation reduces the basic call overhead
full-word integer arithmetic on signed and unsigned
because the control transfer is a direct jump and ar-
32 bit operands. Word-integer arithmetic is used to
gument syntax checking is safely eliminated. Python
implement bignum operations, resulting in unusually
also does keyword and optional argument parsing at
good bignum performance.
compile-time.
Another advantage of block compilation is that lo-
cally optimized calling conventions can be used. In 7 Implementation
particular, numeric arguments and return values are
passed in non-pointer registers. Python’s structure is broadly characterized by the
In addition to reductions in the call overhead itself, compilation phases and the data structures (or z’nter-
block compilation is important because it extends mediate representations) that they manipulate. Two
type inference and partial evaluation across function major intermediate representations are used:
boundaries. Block compilation allows some of the
The Implicit Continuation Representation (ICR)
benefits of whole-program compilation[3] without ex-
repres mts the lisp-level semantics of the source
cessive compile-time overhead or complete loss of in-
code during the initial phases. ICR is roughly
cremental redefinition.
equivalent to a subset of Common Lisp, but is
Python provides an extension which declares that
represented as a flow-graph rather than a syntax
only certain functions in a block compilation are en-
tree. The main ICR data structures are nodes,
try points that can be called from outside the com-
cent inuat ions and blocks. Partial evaluation
pilation unit. This is a syntactic convenience that
and semantic analysis are done on this represen-
effectively converts the non-entry defuns into an en-
tation. Phases which only manipulate ICR com-
closing labels form. Whichever syntax is used, the
prise the “front en~.
entry point information allows greatly improved op-
timization because the compiler can statically locate The Virtual Machine Representation (VMR)
all calls to block-local functions. represents the implementation of the source code
on a virtual machine. The virtual machine may
vary depending on the the target hardware, but
6 Numeric Efficiency VMR is sufficiently stylized that most of the
phases which manipulate it are portable. All val-
Efficient compilation of Lisp numeric operations has ues are stored in Temporary Names (TNs). All
long been a concern[16, 6]. Even when type inference opera, ions are Virtual Operations (VOPS), and
is sufficient to show that a value is of an appropriate their >perands are either TNs or compile-time
numeric type, efficient implementation is still diffi- constants.
cult because the use of tagged pointers to represent
all objects is incompatible with efficient numeric rep- When compared to the intermediate representations
used by most Lisp compilers, ICR is lower level than
resent at ions.
an abstract syntax tree front-end representation, and
Python offers a combination of S( veral features
VMR is higher level than a macro-assembler back-end
which make efficient numeric code much easier to at-
represent at ion.
tain:

● Untagged numbers can be allocated both in reg-


8 Compilation Phases
isters and on the stack.

Major phases are briefly described here. The phases


● Type inference and the wide palette of code gen-
from “local call analysis” to “type check generation”
eration strategies reduce the number of declara- all interact; they are generally repeated until nothing
tions needed to get efficient code.
new is disf overed.

● Block compilation increases the range over which ICR conversion Convert the source into ICR, do-
efficient numeric representations can be used. ing m croexpansion and simple source-to-source

240
transformation. All names are resolved at this Control analysis Linearize the flow graph in a way
time, so later source transformations can’t intro- that r.linimizes the number of branches. The
duce spurious name conflicts. See section 9. block-ievel structure of the flow graph is frozen
at this point (see section 12.3.)
Local call analysis Find calls to local functions
and convert them to local calls to the correct en- Stack analysis Discard stack multiple values that

try point, doing keyword parsing, etc. Recognize are u rused due to a local exit making the val-

once-called functions as lets. Create entry stubs ues re,.eiver unreachable. The phase is only run

for functions that are subject to general function when I’N allocation actually uses the stack val-
ues represent ation.
call.

VMR conversion Convert the ICR into VMR by


Find components Find the flow graph components
translating nodes into VOPS. Emit simple type
and compute a depth-first ordering. Delete un-
checks. See section 12.
reachable code. Separate top-level code from
run-time code, and determine which components Copy propagation Use flow analysis to eliminate
are top-level components. unnecessary copying of TN values.

ICR optimize A grab-bag of all the non-flow ICR Representation selection Look at all references to
optimization. Fold constant functions, prop- each TN to determine which representation has
agate types and eliminate code that computes the lowest cost. Emit appropriate VOPS for co-

unused values. Source transform calls to some ercions to and from that representation. See sec-
known global functions by replacing the function tion 13.
with a computed inline expansion. Merge blocks
Register allocation Use flow analysis to find the
and eliminate if-if constructs. Substitute let
live set at each call site and to build the conflict
variables. See section 10.
graph Find a legal register allocation, attempt-
ing to minimize unnecessary moves. Registers
Constant propagation This phase uses global flow
are saved across calls according to precise life-
analysis to propagate information about lexical
time information, avoiding unnecessary memory
variable values (and their types), eliminating un-
references. See section 14.
necessary type checks and tests.

Code generation Convert the VMR to machine in-


Type check generation Emit explicit ICR code
structions by calling the VOP code generators
for any necessary type checks that are too com-
(see section 15.1.)
plex to be easily generated on the flyby the back
end. Print compile-time type warnings. Instruction-level optimization
Some optimizations (such as instruction schedul-
Event driven operations Some ICR attributes are ing) must be done at the instruction level. This
increment ally recomputed, either eagerly on is not a general peephole optimizer.
modification of ICR, or lazily, when the relevant
information is needed. Assembly and dumping
Resol} e branches and convert to an in-core func-
Environment analysis Determine which distinct tion o“ an object file with load-time fixup infor-
environments need to be allocated, and what mation.
context needs to be closed over by each environ-
ment. We detect non-local exits and and mark
locations where dynamic state must be restored.
9 The Implicit Continuation
This is the last pure ICR pass. Representation (ICR)

TN allocation Iterate over all defined functions, ICR is a fl JW graph representation that directly sup-
determining calling conventions and assigning ports flow analysis. The conversion from source into
TNs to local variables. Use type and policy in- ICR throws away large amounts of syntactic infor-
formation to determine which VMR translation mation, eliminating the need to represent most envi-
to use for known functions, and then create TNs ronment manipulation special forms. Control spe-
for expression evaluation temporaries. Print ef- cial forms such as block and go are directly rep-
ficiency notes when a poor translation is chosen. resented by the flow graph, rather than appearing

241
as pseudo-expressions that obstruct the value flow. Some sk ts are shared between all node types (via
This elimination of syntactic and control informa- defstruct inheritance.) This information held in com-
tion removes the need for most “bet a transformation” mon betwt,en all nodes often makes it possible to
optimizations[171. avoid special-casing nodes on the basis of type. The
The set of special forms recognized by ICR conver- common information primarily concerns the order of
sion is exactly that specified in the Common Lisp evaluation and the destinations and properties of re-
standard; all macros are really implemented using sults. This control and value flow is indicated in the
macros. The full Common Lisp lambda is imple- node by references to continuations.
mented with a simple fixed-arg lambda, greatly sim-
plifying later compiler phases. 9.3 Blocks

The block structure represents a basic block, in the


9.1 Continuations control flow sense. Control transfers other than sim-
ple sequen,-ing are represented by information in the
A continuation represents a place in the code, or alter-
natively, the destination of an expression result and block structure. The continuation for the last node in

a transfer of control[l]. This is the Implicit Continu- a block represents only the destination for the result.

ation Representation because the environment is not


directly represented (it is later recovered using flow 9.4 Source Tracking
analysis. ) To make flow analysis more tractable, con-
Source location information is recorded in each node.
tinuations are not first-class — they are an internal
In additior, to being required for source-level debug-
compiler data structure whose representation is quite
ging, source information is also needed for compiler
different from the function representation.
error messages, since it is very difficult to reconstruct
anything resembling the original source from ICR.
9.2 Node Types

In ICR, computation is represented by these node 10 Properties of ICR


types:
ICR is a flow-graph representation of de-sugared Lisp.
if Represents all conditionals.
It combines most of the desirable properties of CPS
with direct support for data flow analysis.
set Represents a setq.
The ICR transformations are effectively source-to-
source, but because ICR directly reflects the control
ref Represents a constant or variable reference.
semantics of Lisp (such as order of evaluation), it has
combination Represents a normal function call. different properties than an abstract syntax tree:

mv-combinat ion Syntactically different expressions with the same


control flow are canonicalized:
Represents a multiple-value-call. This is
used to implement all multiple value receiving
(tagbody (go 1)
forms except for mult iple-value-progi, which
2y
is implicit.
(go 3)
bind This represents the allocation and initialization lx
of the variables in a lambda. (go 2)
3 z)
return This collects the return value from a lambda *
and represents the control transfer on return. (progn x y z nil)

entry Marks the start of a dynamic extent that can Use/definition information is available, making
have non-local exits to it. Dynamic state can be substitutions easy.
saved at this point for restoration on re-entry.
All lexical and syntactic information is frozen
exit Marks a potentially non-local exit. This node is in dul ing ICR conversion (alphatization[ 17]), so
interposed between the non-local uses of a con- transformations can be made with no concern for
tinuation and the value receiver so that code to variat Ie name conflicts, scopes of declarations,
do a non-local exit can be inserted if necessary. etc.

242
● During ICR conversion this syntactic scoping in- inference, since inferred types can’t be inverted into
formation is discarded and the environment is intelligible type specifiers for reporting back to the
made implicit, causing further canonicalization: user.

One important technique is to separately rep-


(let ((fun (let ((val init)) resent the results of forward and backward type
#’(lambda () . ..val...)))) inference [4, 8]. This separates provable type infer-
. . . (funcall fun) ... ences from assertions, allowing type checks to be
e emitted only when needed.
(let ((val init))
Dynamic type inference is done using data flow
(let ((fun #’(lambda () . ..val...)))
analysis tc solve a variant of the constant propaga-
. . . (funcall fun) ...
tion problem. Wegman[19] offers an example of h~w
constant propagation can be used to do Lisp type in-
● All the expressions that compute a value (uses of
ference.
a continuation) are directly connected to the re-
ceiver of that value, so syntactic close delimiters
don’t hide optimization opportunities. As far as
opt imizat ion of + is concerned, these forms are 12 The Virtual Machine Rep-
identical:
resentation (VMR)
(+3 (progn (gork) 42))
(+3 42) VMR preserves the block structure of ICR, but anno-

(+3 (unwind-protect 42 (gork))) tates the nodes with a target dependent Virtual Ma-
chine (VM) Representation. VMR is a generalized
As the last example shows, this holds true even tuple representation derived from PQCC[l 1]. There
when the value posit ion is not tail-recursive. are only two major objects in VMR: VOPS (Virtual
Operations) and TNs (Temporary Names). All in-
structions are generated by VOPS and all run-time
11 Type Inference values are stored in TNs.

Type inference is important for efficiently compil-


ing generic arithmetic and other operations whose
12.1 Values and Types
definition is based on ad-hoc polymorphism. These
operations are generally only open-coded when the
A TN holds a single value, When the number of val-
operand types are known at compile-time; insufficient
ues is kno m, multiple TNs are used to hold multi-
type information can cause programs to run thirty
ple values, otherwise the values are forced onto the
times slower. Almost all prior work on Lisp type
stack. TNs are used to represent user variables as
inference[4, 14, 9, 17] has concentrated on reducing
well as expression evaluation temporaries (and other
the number of type declarations required to attain
implicit values. )
accept able performance (especially for generic arith-
metic operations.) A primitiue type is a type meaningful at the im-

Type inference is also important plementation level; it is an abstraction of the pre-


because it allows
cise Common Lisp type. Examples are f ixnum,
compile-time type error messages and reduces the
base-char and single-f loat. During VMR conver-
cost of run-time type checking.
sion the primitive type of a value is used to determine
both where where the value can be stored and which
11.1 Type Inference Techniques type-specific implementations of an operation can he

In order to support type inferences involving complex applied to the value. A primitive type maY have mul-
types, type specifiers are parsed into an internal rep- tiple possible representations (see Representation
resentation. This representation supports subtypep, Selection ).
type simplification, and the union, intersection and VMR cc nversion creates as many TNs as necessary,
difference of arbitrary numeric subrangcs, member and annotating, them only with their primitive type. This
or types. function types are used to represent func- keeps VMR conversion from needing any information
tion argument signatures. Baker[2] describes an ef- about the Llumber or kind of registers, etc. It is the
ficient and relatively simple decision procedure for responsibi ity of Register Allocation to efficiently
subtypep, but it doesn’t support act ountable type map the al ,ocated TNs onto finite hardware resources.

243
12.2 Virtual Operations This phase is in effect a pre-pass to register allo-
cation. The main reason for its separate existence
The PQCC VM technology differs from a conven-
is that representation conversions may be fairly com-
tional virtual machine in that it is not fixed. The set
plex (e.g. themselves requiring register allocation),
of VOPS varies depending on the target hardware.
and thus must be discovered before register alloca-
The default calling mechanisms and a few prim-
tion.
itives are implemented using standard VOPS that
must implemented by each VM. In PQCC!, it was a
rule of thumb that a VOP should translate into about
one instruction. VMR uses a number of VOPS that 14 Register allocation
are much more complex (e.g. function call) in order
to hide implementation details from VMR conversion. Consists of two passes, one which does an initial reg-
ister allocation, a post-pass that inserts spilling code
Calls to primitive functions such as ~ and car are
to satisfy violated register allocation constraints.
translated to VOP equivalents using declarative in-
formation in the particular VM definition; VMR con- The interface to the register allocator is derived

version makes no assumptions about which opera- from the e~tremely general storage description model

tions are primitive or what operand t~ pes are worth developed for the PQCC project [12]. The two major
special-casing. concepts a. “e:

storage base ( SB): A storage resource such as a


12.3 Control flow register file or stack frame. It is composed of
same-size units which are jointly allocated to cre-
In ICR, control transfers are implicit in the structure
ate larger values. Storage bases may be fixed-size
of the flow graph. Ultimately, a linear instruction se-
(for registers) or variable in size (for memory.)
quence must be generated. The Control Analysis
phase decides what order to emit blocks in. In effect,
this amounts to deciding which arm of each condi- storage class (SC ): A subset of the locations in an

tional is favored by allowing to drop through; the underlying SB, with an associated element size.

right decision gives optimizations such as loop rota- Examples are des cript or-reg (based on the

tion. A poor ordering would result in unnecessary register SB), and double-float-stack (based
unconditional branches. on the number-stack SB, element size 2, offsets

In VMR all control transfers are explicit. Con- dual-word aligned.)

ditionals are represented using conditional branch


VOPS that take single target label and a not-p flag In add; t m to allowing accurate description of any

indicating whether the sense of the test is negated. conceivable set of hardware storage resources, the

An unconditional branch VOP is emitted afterward SC/SB mt }del is synergistic with representation se-
if the other path isn’t a drop-through. lection. Representations are SCS. The set of legal
representa~ions for a primitive type is just a list of
SCS. This is above and beyond the advantages noted
13 Representation selection in [6] of having a general-purpose register allocator as
in the TN13ind/Pack model [l O].
Some types of object (such as single-f lost) have
multiple possible representations [6]. Multiple repre-
sentations are useful mainly when there is a particu-
15 Virtual Machine Definition
larly efficient untagged representation. In this case,
the compiler must choose between the normal tagged
The VM description for an architecture has three ma-
pointer representation and an alternate untagged rep-
jor parts:
resentation, Representation selection has two sub-
phases:
● The SB and SC definition for the storage re-
sourws, with their associated primitive types.
● TN representations are selected by examining all
the TN references and choosing t~le representa-
tion with the lowest cost. ● The il struction set definition used by the assem-
bler, z ssembly optimizer and disassembler.
● Representation conversion code is inserted where
the representation of a value is chx;iged. . The definitions of the VOPS.

244
15.1 VOP definition example when the programmer can assume a high level of op-
timization.
VOPS are defined using the def ine-vop macro. Since
The main cost of this additional functionality is
classes of operations (memory references, ALU op-
in compile time: compilations using Python may be
erations, etc. ) often have very similar implement a-
as much as 3x longer than they are for the quick-
tions, def ine-vop provides two mechanisms for shar-
est commercial Common Lisp compilers. However,
ing the implementation of similar VOPS: inheritance
on modern workstations with at least 24 megabytes
and VOP variants. It is common to define an “ab-
of memory incremental recompilation times are still
stract VOP” which is then inherited by multiple real
quite acceptable. A substantial part of the slowdown
VOPS. In this example, single-f loat-op is an ab-
results from poor memory system performance caused
stract VOP which is used to define all dyadic single-
by Python”~ large code size (3.4 megabytes) and large
float operations on the MIPS R3000 (only the + def-
data structures (0.3-3 megabytes.)
inition is shown.)

17 Summary
(define-vop (single-float-op)
(:args (x :SCS (single-reg)) ● Strict type checking and source-level debugging
(y :scs (single-reg))) make development easier.
(:results (r :SCS (single-reg)))
(:variant-vars operation) ● It is both possible and useful to give compile-
(:policy :fast-safe) time warnings about type errors in Common Lisp
(:note “inline float arithmetic”) programs.
(:vop-var vop)
● Optimization can reduce the overhead of type
(:save-p :compute-only)
checking on conventional hardware.
(generator 2
(note-this-location vop :internal-emor) ● The d lfficulty of producing efficient numeric and
(inst float-op operation :single r x y))) array code in Common Lisp is largely due to the
compiler’s opacity to the user, The lack of a
(define-vop (+/single-float single-float-op) simph efficiency model can be compensated for
(:variant ‘+)) by making the compiler accountable to the user
for its optimization decisions.

● Partial evaluation is a practical programming


tool when combined with inline expansion and
16 Evaluation
block compilation. Lisp-level optimization is im-
portant because it allows programs to be written
For typical benchmarks, CMU CL run-times are mod-
at a higher level without sacrificing efficiency.
estly bet ter than commercial implementations. When
average speeds of the non-I/0, non-system Gabriel ● [t is possible to get most of the advantages of
benchmarks were compared to a commercial imple- continuation passing style without the disadvan-
mentation, CMU unsafe code was 1.2 times faster, tages.
and safe code was 1.8 times faster. Applications that
make heavy use of Python’s unique features (such as ● The clear ICR/VMR division helps portability.
partial evaluation and numeric representation) can be It is probably a mistake to convert a Lisp-based
5x or more faster than commercial implementations. representation directly to assembly language.
Generally, poorly tuned programs show greater
● A veq~ general model of storage allocation (such
performance improvements under CMU CL than
as the one used in PQCC) is extremely helpful
highly tweaked benchmarks, since Py then makes a
when defining untagged representations for Lisp
greater effort to consistently implement all language
values. This is above and beyond the utility of a
features efficiently and to provide efficiency diagnos-
general register allocator noted in [6].
tics.
The non-efficiency advantages of CMU CL are at
least as important, but harder to quantify. Commer- 18 Acknowledgments
cial implementations do weaker type checking than
CMU CL, and do not support source level debugging Scott Fahlman’s infinite patience and William Lott’s
of optimized code. It is also easier to write programs machine-level expertise made this project possible.

245
This research was sponsored by The Defense Ad- [11] LEVINtETT, B. W., CATTEL, R. G. G., HOBBS,
vanced Research Projects Agency, Information Sci- S. O., NEWCOMER, J. M., REINER, A. H.,
ence and Technology Office, under the title “Research SCHATZ, B. R., AIND WULF, W. A. An
on Parallel Computing”, ARPA Order No. 7330, is- overview of the production quality compiler-
sued by DARPA/CMO under Contract MDA972-90- compiler project. Computer 13, 8 (Aug. 1980),
C-0035. The views and conclusions contained in this 38-49.
document are those of the authors and should not be
[12] REINER, A. Cost minimization in register
interpreted as representing the official policies, either
assignment for retargetable compilers. Tech.
expressed or implied, of the U.S. Government.
Rep. f ~MU-CS-84-137, Carnegie Mellon Univer-
sity S~’hool of Computer Science, June 1984.

References [13] SCHEIFLER, R. W. An analysis of inline substi-


tution for a structured programming language.
[1]APPEL, A. W., AiND JIM, T. Oontinuation-
Communications of the ACM 20, 9 (Sept. 1977),
passing, closure-passing style. In ACM Con-
647-6 ‘,4.
ference on the Principles of Programming Lan-
guages (1989), ACM, pp. 293–302. [14] SHIVFRS, O. Control-Flow Analysis in Higher-
Order Languages. PhD thesis, Carnegie Mel-
[2] BAKER, H. G. A decision procedure for common lon IJniversity School of Computer Science, May
lisp’s subtypep predicate. to appear in Lisp and 1991.
Symbolic Computation, Sept. 1990.
[15] STEELE, B. S. K. An accountable source-to-
[3] BAKER, H. G. The nimble project -– an applica- source transformation system. Tech, Rep. AI-
tions compiler for real-time common lisp. In Pro- TR-636, MIT AI Lab, June 1981.
ceedings of the InfoJapan ’90 Conference (Oct.
[16] STEELE JR., G. L. Fast arithmetic in maclisp,
1990), Information Processing Society of Japan.
In Proceedings of the 1977 MA CSYMA User’s

[4] BEER, R. The compile-time type inference and Conference (1977), NASA.

type checking of common lisp programs: a tech-


[17] STEELE JR., G. L. Rabbit: A compiler for
nical summary. Tech. Rep. TR-88-1 16, Case
scheme. Tech. Rep. AI-TR-474, MIT AI Lab,
Western Reserve University, 1988.
May !.978.

[5] BROOKS, R. A., AND GABRIEL, R. P. A cri- [18] TEI’IWLMAN, W., ET AL. Interlisp Reference
tique of common lisp. In ACM Conference on Manual. Xerox Palo Alto Research Center, 1978.
Ltsp and Functional Programming (1984), pp. 1-
8. [19] WE~IViAN, M. N,, AND ZADECK, K. Constant
propagation with conditional branches. Tech.
[6] BROOKS, R. A., GABRIEL, R. P. AND STEELE Rep. CS-89-36, Brown University, Department
JR., G. L. An optimizing compiler for lexically of Computer Science, 1989.
scoped lisp. In ACM Symposium on Compiler
[20] WIN OGRAD, T. Breaking the complexity bar-
Construction (1982), ACM, pp. 261-275.
rier again. In ACM SIGPLA N–SIGIR Interface

[7] GABRIEL, R. P. Lisp: Good news, bad news, Meetzng (1973), ACM, pp. 13-22.

how to win big. to appear, 1991.


[21] WINOGRAD, T. Beyond programming lan-
guages. Communications of the ACM 22, 7 (Aug.
[8] JOHNSON, P. M. Type Flow Analysis for Ex-
1979) 391-401.
ploratory Soflware Development. PhD thesis,
University of Massachusetts, Sept. 1990. [22] ZELLWEGER, P. T. Interactive source-level de-
buggi lg of optimized programs. Tech. Rep. CSL-
[9] KRANZ, D., KELSEY, R., REES, J., ET AL. Or-
84-5, Xerox Palo Alto Research Center, May
bit: An optimizing compiler for scheme. In ACM
1984.
Symposium on Compiler Constr,,ction (1986),
ACM, pp. 219-233. [23] ZURA VSKI, L. W., AND JOHNSON, R. E. De-
bugpj ig optimized code with expected behavior.
[10] LEVERETT, B. W. Regzster Allocation in Opti- to apr,ear, Apr. 1991.
mi.zmg Compilers. UM1 Research Press, 1983.

246

You might also like