Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Using linearity to allow heap-recycling in Haskell

Chris Nicholls May 24, 2010

Abstract
This project investigates destructive updates and heap-space-recycling in Haskell through the use of linear types. I provide a semantics for an extension to the STG language, an intermediate language used in the Glasgow Haskell Compiler (GHC), that allows arbitrary data types to be updated. A type system based on uniqueness typing is also introduced that allows the use of the new semantics without breaking referential transparency. The type system aims to be simple and syntactically light, allowing a programmer to introduce destructive updates with minimal changes to source code. I have implemented this semantic extension to both in an interpreter for the STG language and in the GHC backend. Finaly, I have written a type checker for this system that works over a subset of Haskell.

Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction The Problem With Persistence . . . . . . . . . . . . . . . . . . . . 2 Uniqueness 3 Uniqueness in Type Systems Linear Logic . . . . . . . . . . . . . . . . . . . Clean . . . . . . . . . . . . . . . . . . . . . . Monads . . . . . . . . . . . . . . . . . . . . . Hage & Holdermans Heap Recycling for Lazy Uniqueness in Imperative Languages . . . . . A Simpler Type System for Unique Values . . 4 Implementation The STG Language . . . . . . Operational Semantics of STG Closure Representation . . . . Adding an Overwrite construct Ministg . . . . . . . . . . GHC . . . . . . . . . . . . Garbage Collection . . . . . . . 5 Results 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 3 4 6 6 7 7 8 8 10 13 14 16 18 19 20 20 24 27 29

Chapter 1

Introduction
The Problem With Persistence
One striking feature of pure functional programming in languages such as Haskell is the lack of state. As all data structures are persistent, updating a value does not destroy it but instead creates a new copy. The advantages of this are well known[1][2] but conversely so are the disadvantages[3][5]. In particular, persistence can lead to excessive memory consumption when structures remain in memory long after they have ceased to be useful[6]. The reason Haskell does not allow state is to avoid side eects and the reason side eects are avoided is because they can make understanding and reasoning about programs di cult. Indeed, from a theoretical point of view, side eects simple arent required for computation. Yet undeniably, side effects are useful, particularly when implementing e cient data structures[4]. Whilst the lack of destructive update in Haskell is useful in accomplishing the goal of referential transparency, it is not strictly necessary. It is sometimes possible to allow destructive updates without introducing observable side eects.

Chapter 2

Uniqueness
Imagine a program that reads a list of integers from a le, sorts them and then continues to process the sorted list in some manner. In an imperative setting, we might expect this sorting to be done in- place, but in Haskell we must allocate the space for a new, sorted version of the list. However, if the original list is not referred to in the rest of the program, then any changes made to the data contained in the list will never be observed. Thus there is no need to maintain the original list. This means we could re-use the space occupied by the unsorted list, and sine we know that sorting preserves length, we might begin to wonder if we can do the sorting in-place. The reason we could not use destructive updates in the example above is that doing so may introduce side eects into our program. For instance, if we are able to sort a list in-place then the following code becomes problematic: foo :: [a ] ! ([a ], [a ]) foo xs = (xs, (sort in place xs)) Does fst (foo [3,2,1]) refers to a sorted list or an unsorted list? With lazy evaluation we have no way of knowing. Notice however that modifying the original, unsorted list is only a problem if it is referred to again elsewhere in the program. If the list is not used anywhere else, then there can be no observable side eects of updating it in place as any data that cannot be referenced again can have no semantic eect on the rest of the program. If this where the case then the compiler would be free to re-use the space previously taken up by the list, perhaps updating the data structure in-place, and referential transparency would not be broken. This condition, that there is only ever one reference to the list, is known as uniqueness we say that the list is unique. Consider an algorithm that inserts an element into a binary tree(g 2.1). In an imperative language this would normally involve walking the tree until we nd the correct place to insert the element and updating the node at that 4

a'

c'

g'

(a) A binary tree. An element is to be inserted(b) After insertion. A new tree has been created in the outlined position from the old one

Figure 2.1: Inserting an element into a binary tree position. However in a functional language, we must instead copy all the nodes above the one to be updated and create a new binary tree. If the original tree was unique, that is, the only reference to a was passed to the function that inserted m, then there will no longer be any references to a. Consequently, there will be no longer be any references to nodes c or g either. All three node will be wasting space in memory. If a larger number of nodes are inserted then it is possible that the space wasted will be many times greater than the space taken up by the tree! Clearly a lot of space can be wasted. In general it is not possible to predict when an object in a Haskell will become garbage, so garbage collection must be a dynamic run-time process. Because garbage collection happens at run-time, there is a performance penalty associated with it. Indeed, whilst garbage collection can be very e cient when large amounts of memory are available[8], it can often take up non-trivial percentages of the programs execution time in memory constrained environments. But when an object is know always to be unique, its lifetime can be determined statically and so the run-time cost of garbage collection can be avoided.

Chapter 3

Uniqueness in Type Systems


Linear Logic
Linear Logic is a system of logic proposed by Jean-Yves Girard in which each assumption must be used exactly once. Wadler noticed that in the context of programming languages, linear logic corresponds to No Duplication. No value is shared so, as we have seen, destructive update is permissible. No Discarding. Each value is used exactly once. This use represents an explicit deallocation, so no garbage collection is required. Wadler proposed a Linear type system based directly on Girads logic [10] [11]. In this type system every value is labeled as being either linear or nonlinear. Functions are then typed to accept either linear or nonlinear arguments. In [7] David Wakeling and Colin Runciman describe an implementation of a variant of Lazy ML that incorporates linear types. Their results are disappointing: the performance of programs using linear data structures is generally much worse than without. The cost of maintaining linearity easily outweighs the benets of reduced need for garbage collection. Along similar lines, Henry Baker provides an implementation of linear Lisp [17] that restricts every type to being linear. The result is an implementation of Lisp that requires no run-time memory management. This comes at a price, however: Baker found that much of the work must instead be done by the programmer and, as with Linear ML, the large amounts of book-keeping and explicit copying mean that linear Lisp is slightly slower that its classical counterpart.

Clean
Clean[23] is a language very similar to Haskell that features a unique type system based on linear logic. Clean allows users to specify particular variables as being unique The type system exposed to the user is large and often simple functions can have complex types. However, the de-facto implementation has proved to be very e cient. One particularly interesting feature of Clean is that the state of the world is explicit. Every Clean program passes around a unique object, the world. The world represents the state of the system, explicitly threaded throughout the program and is thus destructive updates to the world can be used to sequence IO operations. Unique objects cannot be duplicated, so no more than one world can exist at a time and hence there is no danger of referring to an old state by accident.

Monads
Haskell takes a dierent approach towards IO. Monads, as presented by Wadler and Peyton-Jones [27] can do much of the work of uniqueness typing by the use of encapsulation. Indeed, they are much simpler in terms of both syntax and type system. However, monads do not solve every problem as elegantly. Suppose we have a program that makes use of a binary tree: data BinTree a = Empty | Node a (BinTree a) (BinTree a) insert :: a ! BinTree a ! BinTree a removeMin :: BinTree a ! (a, BinTree a) isEmpty :: BinTree a ! Bool

If we want to allow the tree to be updated destructively we can employ the ST monad, replacing each branch by a mutable reference, an STRef. However, as STRefs require a state parameter, we must also add a type parameter to our binary trees. data BinTree s a = Empty | Node a (STRef s (BinTree s a) (BinTree s a)) Unfortunately, none of the code we have written to work over binary trees will work any more! Not only are the type signatures incorrect, but the whole implementation must be re-written to work within the state monad. insert :: a ! BinTree s a ! ST s (BinTree s a) removeMin :: BinTree s a ! ST s (a, BinTree s a) ... 7

Monadic code can often dier signicantly in style to idiomatic functional code, so this may end up aecting large portions of our code. This can clearly cause problems if we where trying to optimise a large program in which the binary tree implementation had been identied as a bottleneck.

Hage & Holdermans Heap Recycling for Lazy Languages


As a way of avoiding this monad creep, Hage and Holdermans present a language construct to allow destructive updates for unique values in nonstrict, pure functional languages[19]. Their solution makes use of an embedded destructive assignment operator and user-controlled annotations to indicate which closures can be re-used. They describe a type system that restricts the use of this operator and prove that referential transparency is maintained. Hage and Holdermans do not provide an implementation of either the type system or the destructive assignment operator. They also state concern in the complexity of the type system exposed to the user, despite it being simpler than the system used in Clean. This is the issue addressed in the next section of this paper.

Uniqueness in Imperative Languages


The initial motivation for this project came not from linear logic but from imagining an imperative language that maintained a form of referential transparency. This language has two types of variables, consumable and immutable. Each function then accepts two sets of variables, one set is the set of variables that this function consumes, the other is the set of variables that it views. During execution, a function f is said to own a variable x if and only if: the variable x was created inside the body of f (either from a closed term or a literal), or x was passed to f as a consumable variable; f has not passed x as a consumable variable to any other function. Each function is restricted so that the only variables it can modify or return are the variables it owns. One further restriction is then when a variable is passed in to a function as a consumable variable, it is removed from the current scope (this means it cannot be used as another argument to the same function). Thus, any variable passed into a functional as a viewed argument will not change during the execution of that function, and any variable passed in as a consumed argument can not be referred to again, so destructively updating the variable will not cause side eects.

As an example, here is an implementation of quicksort in this theoretical language: qsort (consumed xs :: [Int ]) ! [Int ] = { return sort (xs, Nil ) }

sort (consumed xs :: [Int ], end :: [Int ]) ! [Int ] = { case xs of Nil ! return end ; Cons (x , xs 0 ) ! { ys, zs := split (x , xs 0 ); zs 0 := sort (zs, end ); return sort (ys, Cons (x , zs 0 )); } }

split (viewed p :: Int; consumed xs :: [Int ]) ! ([Int ], [Int ]) = { case xs of Nil ! return ([ ], [ ]) Cons (x , xs 0 ) ! { ys, zs := split (p, xs 0 ); if x > p then : return (ys, Cons (x , zs)); else : return (Cons (x , ys), zs); } } In the body of sort, xs will be out of scope after the case expression, and after the line split(xs0 , x) xs0 will be out of scope but x will remain in scope, since split consumes its second argument but only views its rst. These rules ensure that at any point in the programs execution, if x is consumable in the current environment then there is no more than a single reference to it. Conversely if there is more than one reference to x then x must be immutable. A su ciently smart compiler would be able to tell that in each case expression, list under scrutiny is never referred to again, only its elements are. Thus, in the case that the list was a Cons cell, the cell can be re- used when a Cons cell is created later on. In this way, the function sort can avoid allocating any new cells and instead operate in-place.

A Simpler Type System for Unique Values


We can translate this idea of consumable variables into Haskell. Below is the code for quicksort written in a version of Haskell extended to include this idea. qsort :: [Int ] ; [Int ] qsort xs = sort xs [ ] sort :: [Int ] ; [Int ] ; [Int ] sort [ ] end = end sort (x : xs) end = sort ys (x : sort zs end ) where (ys, zs) = split x xs split :: Int ! [Int ] ; ([Int ], [Int ]) split [ ] p = ([ ], [ ]) split (x : xs) p = case p > x of True ! (x : ys, zs) False ! (ys, x : zs) where (ys, zs) = split p xs This is deliberately very close to standard Haskell, with one addition. A new form of arrow has been introduced to the syntax of types. The intended meaning of f :: a ; b g :: a ! b is that f consumes a variable of type a and produces a b. Thus the body of f is free to modify x. By comparison, g is a standard Haskell function that only views its rst argument. Intuitively, the new arrow form obeys the following rules: Only unique variables and closed terms may be used as an argument to a function expecting a unique value; The result of applying a function of type (a ; b) to a unique value of type a will be a unique value of type b A unique variable may be used at most once in the body of a function; data structures are unique all the way down i.e. a function (f :: [a] ; [a]) works over a unique list whose elements are also unique.

10

map :: (a ; b) ! [a ] ; [b ] map f [ ] = [ ] map f (x : xs) = f x : map f xs -- Map takes a unique list of and updates it in place -- Notice the function f is not unique itself as it is used twice -- on the right hand side id :: x ; x id x = x compose :: (b ; c) ! (a ; b) ! a ; c compose f g x = f (g x ) double1 double1 double2 double2 :: a ; (a, a) a = (a, a) -- error: unique variable a is used twice. :: a ! (a, a) a = (a, a)

apply1 :: (a ! b) ! a ; b apply1 f x = f x -- error: result of applying f to x will not be unique apply2 :: (a ; b) ! a ! b apply2 f x = f x -- error: f expects a unique argument, x is not unique twice :: (a ; a) ! a ; a twice f = compose f f fold :: (b ; a ; a) ! a ; List b ; a fold f e [ ] = e fold f e (x : xs) = fold f (f x e) xs f1 f1 f2 f2 f3 f3 :: a ; (a ! b) ! b x g =g x :: a ! (a ; b) ! b x g = g x -- error: g expects a unique argument, x is not unique :: a ! (a ! b) ; b x g = g x -- error: the result of applying g to x will not be unique. -- a unique variable may be passed to an argument expecting a -- non-unique variable, but not the other way round. -- Note that in f1, the type signature is implicitly bracketed like this: -- f 1 :: a ; ((a ! b) ! b) -- so the result of a partial application would be a function that is -- itself unique.

Figure 3.1: Some examples of function with possible type signatures and type errors.

11

Semantically, this can be viewed in terms of the system proposed by Hage and Holdermans, equivalent to 1 f :: a1 ! b1 ! ! b! ! g :: a Many functions can be converted to use this type system without needing to alter their denition at all. For instance, a function that reverses a list in-place can be constructed simply by altering the type signature or the standard Haskell function reverse. reverse :: [a ] ; [a ] reverse = rev [ ] where rev :: [a ] ; [a ] ; [a ] rev xs [ ] = xs rev xs (y : ys) = rev (y : xs) ys There is a signicant drawback to this system in the fact that there is more than one possible way to assign a type to some fragments of code. If we want to use both in-place reverse and regular reverse, then we must create two separate functions that dier only by name and type signature. I have implemented a typechecker for this system over a subset of Hasekell. Due to time constraints and the complexity of GHCs type system resulting from the vast number of type system extensions already present, the new type system has not been integrated into GHC. Despite this, the backend mechanisms to allow closure-recycling are fully functional: the example above will compile and run, sorting the list in-place although it will not be typechecked by GHC.

12

Chapter 4

Implementation
I have implemented the backend mechanisms for dealing with overwriting as an extension to the Glasgow Haskell Compiler. This section includes just enough detail about the inner working of the compiler to explain this extension. There are several main stages in the compilation pipeline: The Front End contains the parser and the type checker. The Desugarer converts from the abstract syntax of Haskell into the tiny intermediate Core-language. A set of Core-to-Core optimisations and other transformations. Translation into the STG language. Code generation. This chapter deals with the details of the nal two phases.

13

The STG Language


The STG language is a small, non-strict functional language used internally by GHC as an intermediate language before imperative code is output. Along with formal denotational semantics [26], the STG language also has full operational semantics with a clear and simple meaning for each language construct. Construct Function Application Let Expression Case expression Constructor application Operational meaning Tail Call Heap Allocation Evaluation Return to Continuation

There are also several properties of STG code that are of interest: Every argument to a function or data constructor is a simple variable or constant. Operationally, this means that arguments to functions are prepared (either by evaluating them or constructing a closure) prior to the call. All constructors and built-in operations are saturated. This cannot be guaranteed for every function since Haskell is a higher order language and the arity of functions is not necessarily known, but it simplies the operational semantics. Functions of known arity can be eta-expanded to ensure saturation. Pattern matching and evaluation is only ever performed via case expressions, and each case expression matches one-level patterns. Each closure has an associated update ag . More is explained about these further down. Bindings in the STG language carry with them a list of free variables. This has no semantic eect but is useful for code generation.

14

Program Bindings Lambda-forms Update ag

prog binds lf

! ! ! ! | ! | | | | | | ! | ! ! ! ! ! ! ! !

binds var1 = l f1 ; ...; varn = l fn varsf u n let binds in expr letrec binds in expr case expr of alts var atoms constr atoms prim atoms literal aalt1 ; ...; aaltn ; def ault palt1 ; ...; paltn ; def ault constr vars -> expr literal -> expr var -> expr 0# | 1# | ... +# | -# | *# | /# | ... {var1 , ..., varn } {atom1 , ..., atomn } var | literal Primitive Integers Primitive integer ops n>0 n>0

varsa -> expr Updatable Not updatable Local denition Local recursion Case statements Application Saturated constructor Saturated buit-in op n > 0 (Algebraic) n > 0 (Primitive)

Expression

expr

Alternatives

alts

Algebraic alt Primitive alt Default alt Literals Primitive ops Variable lists Atom lists

aalt palt default literal prims vars atoms atom

Figure 4.1: Syntax of the STG language

15

let x = bind in e; s; H e[x0 /x]; s; H[x7! bind] case v of alts; s; H[v7! C a1 ...an ] e[a 1/x1 ...an /xn ]; s; H

(LET) (x free) alts = {...; C x1 ...xn ! e; ...} v is a literal and does not match any other case alternatives (CASECON)

case v of {...; x! e;...}; s; H e[v/x]; s; H

(CASEANY)

case v of alts; s; H e; (case of alts : s); H v; case of alts; s; H case v of alts; s; h

(CASE) v is a literal or H[v] is in HNF e is a thunk H[y] is a value (RET)

s; s; H[x 7! e] e; (U pd x : s); H y; (U pd x : s); H y; s; H[x 7! H[y]]

(THUNK) (UPDATE)

Figure 4.2: The evaluation rules

Operational Semantics of STG


Ths semantics of the STG language are described in [15] and [26]. An outline of the relevant rules is presented here with some details left out. In particular the details of both recursion and function application are missing as neither have much eect on the ideas presented here. The semantics of the STG language is given in terms of three components: The code e is the expression under evaluation; The stack s of continuations; The heap H is a nite mapping from variables to closures. The continuations, , on the stack take the following forms: ::= | | case of alts U pd t ( a1 ...an ) Scrutinise the returned value in the case statement Updatethe thunk t with the returned value Apply the returned function to a1 ...an

The rst rule, LET, states that to evaluate a let-expression the heap H is extended to map a fresh variable to the right hand side bind of the 16

expression. The fresh variable corresponds to allocating a new address in memory. After allocation, we enter the code for e with x0 substituted for x. Here is the Haskell code for the function reverse, taken from the standard prelude and the corresponding STG code: reverse = rev [ ] where rev xs [ ] = xs rev xs (y : ys) = rev (y : xs) ys reverse = { } n { } ! rev {Nil } rev = { } n {xs ys } ! case ys of Nil { } ! xs Cons {z , zs } ! let rs = {z , xs } n Cons {z , xs } in rev {rs, zs } which should be read in the following way: First bind reverse to a function closure whose code pushes onto the stack a continuation that apply a function to the value N il, then evaluate the code for rev Bind rev to a function closure that expects two arguments xs and ys. The code for this closure should force evaluation of ys and examine the result: if it matches N il, then evaluate the code for N il; if it matches Cons z zs then allocate a Cons cell with arguments z and xs, load rs and zs onto the stack and enter the code for rev. Update ags One feature of lazy evaluation is that each closure should be replaced by its (head) normal form upon evaluation, so that the same closure is never evaluated more than once. The update ag attached to each closure species whether this update should take place. If the ag is set to u then the closure will be updated and if it set to n thet no update will be performed. Na vely, every ag can be set to u, but this is not always necessary. For instance, if a closure is already in head normal form, then updating is not required. Much more detail about this is given in Simon Peyton-Jones paper Implementing functional languages on stock hardware: the Spineless Tageless G-Machine. [26]

17

Closure Representation
Every heap object in GHC is in one of three forms: a head normal form (a value), a thunk which represents an unevaluated expression, or an indirection to another object. A value can either be a function value or a data value formed by a saturated constructor application. The term closure is used to refer to any of these objects. A distinctive feature of GHC is that all closures are represented, and indeed handled, in a uniform way.

Free Variables Code All closures are in this form, with a pointer to an info table containing code and other details about the closure and a list of variables that the closure needs access to. For example, a closure for a function application will store the code for the function in the info table and the arguments in the free variable list. When the closure is evaluated, the arguments can be reached via a known oset from the start of the closure. For a data constructor, the code will return to the continuation of the case statement that forced evaluation, providing the arguments of the constructor application. These arguments are again, simply stored as an oset from the start of the closure.

18

Adding an Overwrite construct


In this section, a new construct overwrite is added to the STG language. The syntax and semantics are given below. The idea is that overwrite xs with e1 in e2 will behave in a similar manner to let xs = e1 in e2 but rather than storing e1 as a new heap- allocated closure and binding to x, the closure bound to x will be overwritten with the closure for e1 . Now, care must be taken to ensure that x really is bound to a closure, not an unboxed value, and that e1 will produce the same type of closure. However, no checking is done at this stage as we assume this (as well as uniqueness checking) has been taken care of elsewhere in the compiler. This highlights another dierence between let and overwrite, namely that the variable bound in the let-construct may be any variable, free or bound whereas in the overwrite-construct, it must be a bound variable. We can add this construct to the example reverse from above: reverse :: [a ] ; [a ] reverse = { } n { } ! rev {Nil } rev = { } n {xs ys } ! case ys of Nil { } ! xs Cons {z , zs } ! overwrite ys with Cons {z , xs } in rev {ys, zs } Since there are no longer any let-constructs in this code, it doesnt allocate any space on the heap! The function runs using a constant amount of space, although in the case that the list is a previously unevaluated thunk, forcing the evaluation of reverse will also force the evaluation and therefore allocation of the list it operates on. Note that we know it is safe to overwrite ys with a Cons cell beacuase we know it to be unique from the type signature1 and we know ys to be a Cons cell already since it was matched in a case expression In general, it is safe to overwrite a closure x with a constructor application C a1 ...an exactly when these two conditions hold: x is known to be unique. This information is provided by the type system. The closure bound to x was built with constructor C and is in normal form. This happens when x has been matched in a case expression, inside the guard for constructor C.
The STG language is untyped, but this inforformation is available during the translation phase.
1

19

Expression expr ! |

... overwrite x with expr in expr

overwrite x with e1 in e2 ; s; H e; s; H[x 7! e1 ]

(OVERWRITE)

Figure 4.3: The overwrite construct

Ministg
Ministg [29] is an interpreter for the STG language that implements the operational semantics as given above. It oers a good place to investigate the new semantics. Here is an outline of the relevant code: The code dealing with overwrite-expressions is largely similar to the code for let-expressions and usually is simpler. For instance, no free variable need be generated unlike in the let- expression and no substitution need be performed. Performing substitutions over overwrite-expressions is also simpler than the corresponding let-expression, as there is no variable capture to be avoided. The nal dierence is in calculating free variables, as in a let expression the variable appearing on the left-hand-side is not free but is free in an overwrite expression.

GHC
At an operational level, these are the only dierences between let- expressions and overwrite expressions. When it comes to implementing the STG language in GHC, however, there are a few more hurdles to overcome. Unsurprisingly, much of the code remains the same as for let- expressions, but the translation is not as direct as in the Ministg interpreter. Firstly: updating variables reacts badly with the generational garbage collector employed in GHC. More detail about this is provided in the next section. Secondly: whereas in the Ministg interpreter variable locations are stored in data structure representing a nite mapping, in GHC variable locations are stored as pointers kept in registers or as osets from the current closure. In the case that the location of a variable is stored at an oset from a closure that is to be overwritten we must make sure to save this location before performing the update, otherwise the location will be lost and will no longer be able to access the variable. In the example below, the addresses for x and xs will be located at an 20

smallStep :: Exp ! Stack ! Heap ! Eval (Maybe (Exp, Stack , Heap)) -- LET smallStep (Let var object exp) stack heap = do newVar freshVar let newHeap = updateHeap newVar object heap let newExp = subs (mkSub var (Variable newVar )) exp return $ Just (newExp, stack , newHeap) -- OVERWRITE smallStep (Overwrite var object exp) stack heap = do let newHeap = updateHeap var object heap return $ Just (exp, stack , newHeap) -- CASECON smallStep (Case (Atom (Variable v )) alts) stack heap | Con constructor args lookupHeap v heap, Just (vars, exp) exactPatternMatch constructor alts = do return $ Just (subs (mkSubList (zip vars args)) exp, stack , heap) -- CASEANY smallStep (Case (Atom v ) alts) stack heap | isLiteral v _ isValue (lookupHeapAtom v heap), Just (x , exp) defaultPatternMatch alts = do return $ Just (subs (mkSub x v ) exp, stack , heap) -- CASE smallStep (Case exp alts) stack heap = do return $ Just (exp, CaseCont alts callStack : stack , heap) -- RET smallStep exp@(Atom atom) (CaseCont alts : stackRest) heap | isLiteral atom _ isValue (lookupHeapAtom atom heap) = do return $ Just (Case exp alts, stackRest, heap) -- THUNK smallStep (Atom (Variable x )) stack heap | Thunk exp lookupHeap x heap = do let newHeap = updateHeap x BlackHole heap return $ Just (exp, UpdateCont x : stack , newHeap) -- Update smallStep atom@(Atom (Variable y)) (UpdateCont x : stackRest) heap | object lookupHeap y heap, isValue object = do return $ Just (atom, stackRest, updateHeap x object heap) Figure 4.4: Outline of the Ministg implementation for the evaluation rules given in gure 4.2 plus the new overwrite expression. 21

oset from the closure for ls. When that closure is overwritten, we lose these addresses, so we must take care to save them in tempory variables rst. ... case ls of Cons x xs ! ... overwrite ls with Cons y ys in ... x ... xs ...

Cons info table ... x offset: 1 xs offest: 2 ...

Cons info table

xs

... ... y offset: x offset: 1 1 xs offest: 2 2 ys offest: ... ...

xs xs

ys

(a)

(b)

Figure 4.5: Overwriting a Cons cell. Any references that pointed to xs in (a) will point to ys after the update (b) and similarly for x and y.

22

Let us now consider another example, map. Intuitively, map seems like a good candidate for in-place updates we scan across the list updating each element with an function application. But there is a problem. Looking at the code for map and the corresponding STG binding we see that map does not allocate any Cons cells! At least not directly: map :: (a ! b) ! [a ] ! [b ] map [ ] = [ ] map f (x : xs) = f x : map f xs map = { } n {f , xs } ! case xs of Nil { } ! Nil Cons {y, ys } ! let fy = {f , y } u f {y } in let mfys = {f , ys } = u map {f , ys } in Cons {fy, mfys } The two closures allocated in the body of map are both thunks allocated on the heap whereas the Cons cell is placed on the return stack. In a strict language, the recursive call to map would allocate the rest of the list, but in a lazy language a thunk representing the suspended computation is allocated instead. This thunk will later be updated with its normal form (either a Cons cell or Nil) if examined in a case statement. In general, the size of the updatees closure and the size of the thunk will not be of the same size, so we cannot blindly overwrite the former with the latter. One can imagine a mechanism whereby upon seeing a unique value in a case statement, the code generator searches the rest of the code for the closure that ts best. If a closure of the same type is built, then we select that. Otherwise we try and reuse as much space as possible by selecting the largest closure that will take up no more space than the closure we wish to overwrite. There is also the possibility of reusing the thunk allocated for the recursive call itself, since once evaluated, it is no longer needed. I have not been able to try implementing this feature, but it would be an interesting improvement to make. There is one more optimisation that could potentially be included. When a variable x known to be unique goes out of scope, we know that it has become garbage, weather or not x appears in a case statement. The compiler would then be free to overwrite x with a new variable y without making any assumptions about the uniqueness of y. There is some di culty here as we do not know if x refers to a value or an unevaluated thunk. If x has not been evaluated then in general we can infer nothing about the size of the thunk it refers to, as it may have been formed from an arbitrary expression. 23

Garbage Collection
GHC uses an n-stage generational garbage collector. A copying collector will partition the heap into two heap spaces: the from-space and the to-space. Initially objects are only allocated in the from-space. Once this space is full, the live object in the from-space are copied into the to-space. Live objects are objects that are reachable from the current code. Any unreachable object (garbage) is never copied so will not take up space in the new heap area, so the new heap will be smaller than the old heap (provided there was unreachable data in the heap). Now the from-space becomes the to-space and vide-versa and the program continues to run. If no space was reclaimed, then the size of the two spaces must be increased, if this is possible. This can be generalised to more than two spaces so that there are many heap-spaces of which any one of them may be acting as the to-space at a given time. This process clearly cannot be employed in a language that allows pointer arithmetic, for example, since closures are frequently being relocated in memory and pointers would be left dangling, or pointing to nonsense. But are things any better in a functional language? Ignoring for the moment lazy evaluation and overwriting, Haskell has the property that any new data value will only point to old data, never the other way round since values are immutable. This means the references in memory form a directed acyclic graph with older values at the leaves and newer values nearer to the root. The idea behind generations is that since structures are immutable, old structures dont usually point to structures created more recently. Because of this is it possible to partition the heap into generations where old generations do not reference new generations. In this way, the garbage collector can re-arrange the new generations without aecting the old generations. It has been observed that in functional programming, old data tends to stay around for much longer than new data [reference] so most unreachable data is newly created. This means that a large proportion of the garbage to be collected usually lies in the youngest generation, so by collecting this we can reclaim decent amounts of space without having to traverse the whole heap. Occasionally however, garbage collecting a young generation will not free up enough space, in which case older generations must also be collected. A generational collector will also use a method to age objects from younger generation into older generations, if they have been around long enough. The usual way of doing this is by recording how many collections an object survives in a particular generation. Once this number exceeds some threshold, the object is moved up into the older generation. By default, GHC uses two generations. This scheme leads to frequent, small collections with occasional, much larger collections of the entire heap. Up until this point, we have been considering only garbage collection in directed acyclic graphs. Things become much less neat when we allow closures to be overwritten, as a closure in an old generation may well be 24

(a) A block of memory split into two generations. The grey blocks are garbage.

(b) After garbage collecting the youngest generation.

Figure 4.6: Generational garbage collection updated to reference a newer closure in a younger generation. When a garbage collection takes place, the younger closure will be moved to a new location and the reference inside the older closure will no longer point to the correct location. It is worth noting that this can happen even without the overwrite construct, owing to lazy evaluation. For example, the following code can be used to create a cyclic list: cyclic :: [a ] ! [a ] cyclic xs = rs rs = link xs rs link [ ] ys = ys link (x : xs) = x : link xs ys To work around this, for each generation, GHC keeps track of the set of closures that contain pointers to younger generations called the remembered set. During a garbage collection, the pointers to younger generations are maintained so as to keep pointing to the correct locations. This remembered set must be updated whenever a closure is overwritten; this is known as the write-barrier. This means that whenever a closure is overwritten, we must check for any old to new pointers being created by considering the generation 25

of the closure being overwritten. Unfortunately, this does incur a signicant performance penalty for the overwrite expression. This will discussed later on in more detail.

26

Chapter 5

Results
Due to the write-barrier overhead, performance gains for this new optimisation are slight. In a benchmark program that sorts a large list of integers we nd that although the time spent in garbage collection drops by around 10%, the extra cost of the write- barrier and saving free variables almost exactly counteracts the benets. By comparison, if we restrict GHC to use a single-space copying collector, thus avoiding the problems associated with older generations, we see a much bigger improvement from in-place updates. However, the overall execution time is worse than when using the generational collector, so there is little point in doing so. For larger, more realistic programs the gain is usually even smaller. Typically, there is little dierence between an optimised program and one without closure-overwriting. It is interesting to note though, that in no case has it been observed that the extension causes a program to run noticeably slower. However, this is only the case in conditions where the run time system is allowed access to much more heap space than is needed. When the amount of the heap space available is restricted to be close to the amount of live data, very dierent results can be seen. Fig 5.2 shows how the performance of the sorting program varies with the size of the heap. For small heap sizes, including destructive update makes a big dierence in the speed of the program. Without allowing destructive update, reducing the size of the heap dramatically incraces the amount of time spent garbage collecting. For a heap size of 8MB garbage collection accounts for approximately 50% of execution time and for a heap size of 2MB, this increases to around 75%. By contrast, with destructive overwriting turned on, reducing the size of the heap has little eect on the program. Indeed the program actually runs slightly faster with a smaller heap! This may be due to improved data locality of a smaller heap and fewer cache misses.

27

none With optimisation Time(s) %GC Without optimisation Time(s) %GC 6.43 46% 6.41 57%

-G1 8.38 56% 9.52 60%

-M2m 6.11 44.8% 11.18 76%

Figure 5.1: Results of running a sorting algorithm with various options aecting the run time system. The code under analysis here is exactly the quicksort example given earlier used to sort a list of 20000 integers taking the minimum time over three runs.

Figure 5.2

28

Chapter 6

Conclusion
As the run time system of GHC has been highly optimised for persistent data structures, overwriting closures provides little benet under typical conditions. Despite this, the technique appears promising for environments where a large amount of excess heap space is not available. A number of possibilities for further optimisation remain open, that may improve the impact of this technique. It is likely that being more aggressive in deciding which closures can be overwritten would lead to better results. In particular, allowing the closures allocated for function calls to be updated is likely to be useful for optimising recursive functions that are not tail recursive.

29

Bibliography
[1] Hudak, P. 1989. Conception, evolution, and application of functional programming languages ACM Computer Survey [2] J. Hughes Why Functional Programming Matters [3] David B. MacQueen Reections on standard ML Lecture notes on Computer Science, Volume 693/1993, pages 32-46. [4] Sylvian Conchon Jean-Christophe Fillantre A Persistent Union-Find Data Structure [5] P. Wadler Functional Programming: Why no one uses functional languages [6] Niklas Rojemo Colin Runciman Lag, Drag and Void: heap-proling and space-e cient compilation revisited Department of Computer Science, University of York [7] David Wakeling Colin Runciman Linearity and Laziness [8] Andrew W. Appel Garbage Collection Can Ce Faster Than Stack Allocation. Department of computer science, Princeton University. [9] Philip Wadler The marriage of eects and monads. Philip Wadler, Bell Laboratories [10] Philip Wadler Is there a use for linear logic? Philip Wadler, Bell Laboratories [11] Philip Wadler A taste of linear logic Philip Wadler, Bell Laboratories [12] Philip Wadler Comprehendnig Monads Philip Wadler, University of Glasgow [13] David N. Turner Philip Wadler Operational Interpretations of Linear Logic

30

[14] Simon Peyton-Jones Implementing functional languages on stock hardware: The Spineless Tagless G-machine version 2.5 University of Glasgow [15] Simon Peyton-Jones Making a Fast Curry: Push/Enter vs Eval/Apply for Higher-order Languages [16] Antony L. Hosking Memory Management for Persistence University of Massachusetts [17] Henry G. Baker Lively Linear Lisp Look Ma, No Garbage [18] Edsko de Vries Rinus Plasmeijer David M Abrahamson Uniqueness Typing Redened [19] Jurriaan Hage Stefan Holdermans Heap Recycling for Lazy Languages Department of Information and Computing Sciences, Utrecht University [20] Jon Mountjoy The Spineless Tagless G-machine, naturally Department of Computer Science University of Amsterdam [21] Francois Pottier Wandering through linear types, capabilities, and regions. [22] Exploring the Barrier to Entry - Incremental Generational Garbage Collection for Haskell A.M. Cheadle A.J. Field S. Marlow S.L. Peyton Jones R.L. While [23] Ntcker E.G.J.M.H. Smetsers J.E.W. Eekelen M.C.J.D. van Plasmeijer M.J. Concurrent Clean [24] Simon Peyton-Jones Simon Marlow The STG Runtime System (revised) [25] Simon Peyton-Jones Simon Marlow The New GHC/Hugs Runtime System [26] Simon Peyton-Jones Implementing Functional languages on stock hardware: the Spineless Tageless G-Machine [27] Simon Peyton-Jones Philip Wadler Imperative Functional Programming [28] J. Launchburry S Peyton-Jones State in Haskell In Lisp and Symbolic Computation, volume 8, pages 293-342. [29] The Mini STG Language: http://www.haskell.org/haskellwiki/Ministg

31

You might also like