Download as pdf
Download as pdf
You are on page 1of 54
Run-time Environment and Code Generation Syllabus vironments — ie Runtime a en ps eke language issues — Storage organization — Storage Allocation Strategies: $ » Stack and Heap allocation - Parameter Passing-Symbol Tables - Dynamic Storage erator ~ Basic Blocks and Flow gr. aphs - Design of a "" Expressions Dynamic Programming Code Generation. Contents Part | : Runtime Environments 4.1 Introduction 4.2 Source Language Issues 43° Storage Organization May-07,12, Dec.-14,17, -. . 44 Storage Allocation Strategies... May-07,10,11,13,14,16,17,19, ~ - Dec.-05,06,07,09,10,11,13,18,22, Marks 16 ++ Marks 16 45 Storage Allocation Space 4.6 Parameter Passing .. May-18, - 4.7 Symbol Tables 7 Marks 6 sr eeeess.. May-12,18, Dec.-05,06,07,10, Marks 15 48 Dynamic Storage Allocation Part Il : Code Generation 4.9 Concept of Code Generation 4.10 Issues in the Design of a Code Generator May-04,05,06,07,11,12,14,16,18, . Dec.-04,10,11,17,18,22, Marks 16 . - - Marks 2 4.11 Target Machine Description .... ae . i fa 2, 1,18, 41 imple Code Generator. . . May-10,11,12,13,15, lait Dec.-06,07,09,11,14,22, Marks 16 4.13 Optimal Code Generation for Expressions Dynamic Programming Code Generation 4.15 Two Marks Questions with Answers 4.14 4-1) 42 Run-time Envionment and Code ¢ Compiler Design | |Pare 1: Runeime Environments | Introduction tion of the input source program some data objects (those Which an ose aeunee source text are important factors to th, by variables) from input so — represented by varia generation. ' a In the code generation these data objects are referred by their addresses * In the code ge * The addresses to these data objects are dependent upon the Organization memory. And the organization of memory is driven by the source |, features ANBuage EE] source Language Issues There are various language features that affect the or, organization of data is determined by answering the foll language issues are - Banization of memory. The lowing questions. The source Does the source language allow recursion ? stances is determined by run time How the parameters are passed to the procedure ? There are two methods of parameter Passing: call by value and call by referee The allocation strategies for each of these methods are different. Some languages support passing of procedures of the procedure itself could be a procedure. Does the procedure re Any procedure has a a itself as parameter or a return sat fer nonlocal names? How ? Cc language dynamically 2 — oe TECHNICAL PUBLICATIONS® - an up-thrust for knowledg? sign compiler DOSS 4-3 Run-time Environment and Code Generation je Organizati [I storage Organization « The compiler demands for a block of memory to operating system. The compiler utilizes this block of memory for running (executing) the compiled program. This block of memory is called run time storage. « The run time storage is subdivided to hold code and data such as : i) The generated target code ii) Data objects iii) Information which keeps track of procedure activations. * The size of generated code is fixed. Hence the target code occupies the statically determined area of the memory. Compiler places the target code at the lower end of the memory. Bottom The amount of memory required by the data objects is known at the compiled time and hence data objects also can be placed at the statically determined area of the memory. Compiler prefers to place the data objects in the statically determined area because these data objects then can be compiled into target code. For example in FORTRAN all the data objects are allocated statically. Hence the static data area is on the top Top of code area. The subdivision of run time memory Fig. gg ee as shown by following figure * The counterpart of control stack is used to manage the active procedures. Managing of active procedures means that when a call occurs then execution of activation is interrupted and information about status of the stack is saved on the stack. When the control returns from the call this suspended activation is resumed after storing the values of relevant registers. Also the program counter is set to the call. This information is stored in the stack area of run hich are contained in this activation can be point immediately after the time storage. Some data objects w! 3 allocated on the stack along with the relevant information. ea of run time storage in which the other information is * The heap area is the ar a items is allocated under the program stored. For example memory for some dati c ; control. Memory required for these data items is obtained from this heap area. Memory for some activation is also allocated from heap area. + ‘The size of stack and heap is not fixed it may grow or shrink interchangeably during the program execution. * Pascal and C need the run time stack. - an up-thrust for knowledge TECHNICAL PUBLICATIONS Compiler Design 4-4 Run-time Environment and Code Generation 1. Explain in detail about the storage organization. COR Aa) reas ? Explain ? 2. How to subdivide a run-time memory into code and data a’ OS Cee Explain - runtime environment with suitable example. we |. Explain about runtime storage management. | 4.4 | Storage Allocation Strategies Cea As we have already discussed that the run time storage is divided into : 1. Code area 2. Static data area 3, Stack area 4. Heap area There are three different storage allocation strategies based on this division of run time storage. The strategies are - 1. Static allocation - The static allocation is for all the data objects at compile time. 2. Stack allocation - In the stack allocation a stack is used to manage the run time storage. 3, Heap allocation - In heap allocation the heap is used to manage the dynamic memory allocation. Let us discuss each of these strategies in detail. EZSI static Allocation a. The size of data objects is known at compile time. The names of these objects are bound to storage at compile time only and such an allocation of data objects is done by static allocation. . b. The binding of name with the amount of storage allocated do not change at run time. Hence the name of this allocation is static allocation. c. In static allocation the compiler can determine the amount of storage required by each data object. And therefore it becomes easy for a compiler to find the addresses of these data in the activation record. d. At compile time compiler can fill the addresses at which the target code can find the data it operates on. e. FORTRAN uses the static allocation strategy. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge | compiler 09597 _—_ 4-5 Run-time Environment and Code Generation ee — at and Code SO — imitations of static allocation 1, The static allocation can be done only if the size of data object is known at compile time. 2, The data structures can not be created dynamically. In the sense that, the static allocation can not manage the allocation of memory at run time. 3, Recursive procedures are not supported by this type of allocation. pa Stack Allocation a. Stack allocation strategy is a strategy in which the storage is organized as stack. This stack is also called control stack. : pb. As activation begins the activation records are pushed onto the stack and on completion of this activation the corresponding activation records can be popped The locals are stored in the each activation record. Hence locals are bound to corresponding activation record on each fresh activation. d. The data structures can be created dynamically for stack allocation. Limitations of stack allocation The memory addressing can be done using pointers and index registers. Hence this type of allocation is slower than static allocation. Heap Allocation a. If the values of non local variables must be retained even after the activation record then such a retaining is not possible by stack allocation. This limitation of stack allocation is because of its Last In First Out nature. For retaining of such local variables heap allocation strategy is used. >. The heap allocation allocates the continuous block of memory when required for her data object. This allocated memory can be storage of activation records or off This deallocated (free) space can be further deallocated when activation ends. reused by heap manager. c. The efficient heap management can be done by + the free blocks and when any memory is deallocated i) Creating a linked list for ded in the linked list. that block of memory is appen ii) Allocate the most suitable block of memory from the linked list. ie. use best fit technique for allocation of block. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge 4-6 Run-time Environment and Code Generation EEE] Comparison between Static, Stack and Heap Allocation Sr.No. Static allocation Stack allocation Heap allocation —_| 1 Static allocation is done In stack allocation, stack is In heap allocation, heap is | used to manage dynamic for all data objects at used to manage runtime | compile time. storage. memory allocation. | 1 2 Data structures can not Data structures and data Data structures and data | be created dynamically objects can be created objects can be created | because in static dynamically. dynamically. | allocation compiler can determine the amount of storage required by each | data object . | Memory allocation : A Memory allocation : The | Memory allocation : Using names of data objects are Last In First Out (LIFO) contiguous block of memory bound to storage at activation records and data __ from heap is allocated for compile time. objects are pushed onto the _activation record or data stack. The memory object. A linked list is addressing can be done maintained for free blocks using index and registers. _ 4. Merits and limitations; Merits and limitations : It Merits and limitations : This allocation strategy is supports dynamic memory _Efficient memory | simple to implement but allocation but it is slower management is done using supports static allocation than static allocation linked list. The deallocated | only. Similarly recursive strategy. Supports recursive space can be reused. But procedures are not procedures but references to since memory block is supported by static non local variables after allocated using best fit, holes allocation strategy. activation record can not be may get introduced in the retained. mee) 1. Discuss in detail about the run time management. OR 2. What are different storage allocation strategies ? Explain. May-07, Marks 10; May-10, 16, Dec.-11, Marks 16; -13, Marks 8, May-17, Marks 16; May-19, Marks 13 3. Discuss the runtimes storage management of a code generator. | 4. What are the limitations of static allocation ? | Describe in detail about the stack allocation in memory management. Explain the sequence of tack allocation processes fora function call, OTE SCRTETD | Discuss in detail about ; i) Storage allocation strategies Elaborate on the storage allocation strategies CE a 2 NS TECHNICAL PUBLICATIONS® - an up-thrust for knowledge M ment and Code Generation \ ier DESI 7 torage Allocation Space s 7 os Activation Tree The activation tree is a graphical representation which describes the flow of control over the procedure call-chains. ‘Activation tree shows the way control enters and leaves activations for one run of a program. gach call is represented by a node. The root of activation tree represents the activation of main procedure. if function A calls function B then node A can be represented as parent of node B. ows that control flows from A to B. This s Node A is to the left of node B if a terminates before B. For example : In the program of quick sort, the procedures ReadArray, quicksrt and partition are the subroutines which are called. The activation tree for them Main =— | Main procedure | that calls the other routines Read array quicksrt (1.9) | oo... Recursive partition (1,9) quicksrt (1,3) quicksrt (5,9) L_call | _—— — |o-™ Result is returned to partion (1.3) quicksrt (1,0) quicksrt (2.3) partion (5,9) _quicksrt (5.5) _quicksrt (7,9) paraniecall a | te on oy est (9,9) partition (2,3) quicksrt (2.1) quicksrt (3,3) partition (7,9) quicksrt (7,7) Fig. 4.5.1 Control stack * The control stack procedure activations. * When the lifetime of a procedur it at the end of its lifetime. For example : is a data structure used at run-time to keep track of live e begins then push a node onto the stack and POP Main ‘quicksrt (1.3) ‘quicksrt (2,3) partition (2,3) Fig. 4.5.2 stack top TECHNICAL PUBLICATIONS® - an up-thrust for knowledge —~ Compiler Design 4-8 Run-time Environment and Code ¢ es Se Rods Beneray ~ one rating EER Activation Record * The activation record is a block of memory used for managing information needed by a single Retum value Actual parameters execution of a procedure. Control link » (Dynamic link) * FORTRAN uses the static data area to store the activation record where as in PASCAL and C the activation record is situated in stack area. The contents of activation record are as shown in the Fig, 4.5.3. ‘Access link (Static link) ‘Saved machine status Local variables * Various fields of activation record are as follows - 1. Temporary values - The temporary variables are needed during the evaluation of expressions. Such variables are stored in the temporary field of activation record. 2. Local variables - The local data is a data that is local to the execution of procedure is stored in this field of activation record. 3. . Saved machine registers - This field holds the information regarding the status of machine just before the procedure is called. This field contains the machine registers and program counter. 4. Control link - This field is optional. It points to the activation record of the calling procedure. This link is also called dynamic link. 5. Access link - This field is also optional. It refers to the non local data in other activation record. This field is also called static link field. Temporaries Fig. 4.5.3 Model of ie 9. lodel of activation 6. Actual parameters - This field holds the information about the actual parameters These actual parameters are passed to the called procedure. 7. Return values - This field is used to store the result of a function call. * The size of each field of activation record is determined at the time when @ procedure is called. DS rh ee odd By taking the example of factorial program explain how activation record will look like for every recursive call in case of factorial (3). Solution : main() { int f; f = factorial (3); TECHNICAL PUBLICATIONS® - an up-thrust for knowledge 4-9 cone! Design int factorial (int n) 1) retum 1; if (n= use retum (n*factorial (n-1)); Run-time Environment and Code Generation Return value To Act ------- _ | calling Act. pat rocedure record parameter] 36) Pr for factorial) | Dynamictink | ,_| First call Act record for main Lied Local t factorial Act. eT | record for Return value Cae Parameter | 3 iy Dynamic fink factorial rma Return value record for Parameter factorial - Dynamic link (2) TECHNICA! 1 PUBLICATIONS” Second call = an up-thrust for knowledge oo Compiler: Design 4-10 Run-time Environment and Code Generay ton Act |. record for rman main Local 7 factorial + Act . record for Return value factorial (3) ameter | 3 Dynamic link | —}- factorial 4 record for Retun vane | 2 factorial (2) Parameter | 2 Dynamic link | — factorial ao Retum value | 1 tecord for Parameter 1 factorial (1) ‘Dynamic link Third call Calling Sequences iv Calling sequences are the series of calls to the procedures. These procedures has ~ the code which is used to allocate activation record on the stack. The information of the procedure is entered in the fields of the activation record. The return sequence is used to restore the state of the machine so that the calling procedure can continue its execution after the call. The caller is a procedure that calls another procedure and callee is the procedure which is getting called by the caller. While designing calling sequences following principles are used. The values passed between caller and callee. are generally placed at the beginning of the callee's activation record. Due to this these values become closer to the caller's activation record. Fixed length values are generally placed at the middle of the activation record. These items include-control link, access link and saved machine status field. If the machine status information is standardized then the task of debuggets become easy and efficient. Items whose size in not yet determined by the compiler are placed at the end of the activation record, There are many variables whose s the compiler by examining their type. Similarly, the size of dynamic artays e is determined by TECHNICAL PUBLICATIONS® - an up-thrust for knowledge sig! wa" aay Run-time Environment and Code Generation gormined by the callee’s paramete deter parameter. Hence such things are placed at the end at the activation record top of stack pointer must be dete yse point rhe mined judic g ; judiciously, For this purp« ty the end of fixed length fields in the activation record. ‘The fixed length data By be accessed by fixed offsets, relative to top of stack pointer. Similarly we can obtain the variable length data just above the top of stack pointer rhe calling sequen 1 The caller evaluates the aetual parameters s during function call are as given below - >, The caller then stores the return address and old top of stack pointer into calle’s activation record. 4, The caller increments the top of stack pointer. 4, The callee saves the register values and other machine status information. 5, The callee initi es the local data and starts executing, The return sequence is as given below - : 1, The callee places the return values next to the parameter. 2, The callee restores the top of stack pointer using the information stored in machine status field. s to the return address which is placed by the caller in the 3, The callee jump: status field. 4. The caller makes use of the return value. EEX] variable Length Data uence can be implemented by call dure which calls another procedure whereas the callee is rocedure. For a procedure call seq ler and callee procedures. The caller procedure is a proce a procedure which gets called by the another p how the variable length data is handled by run time storage For understanding, consider following scenario. rocedure A has two arrays x and y. Is the procedure B. Pi part of activation record of A. In the activation ginning of x ani ompile time. B (callee procedure) begins after the ble length arrays p and q. Then after the Suppose a procedure A call The storage to these arrays is not the tecord of A only the pointers to the be; the relative addresses of these arrays at ¢ The activation record for the procedure arrays of A. Suppose the procedure B has varia activation record of B the arrays for procedure B can be placed. dy are appearing. We can obtain p_sp to keep track of these records. The Two pointers can be maintained top and to) ints to the end of some field in top poi OP points to the actual top of the stack and top_sp P° TECHNICAL PUBLICATIONS® - an up-thrust for knowledge a Compiler Design Run-time Environment and Code Gr _ 9 eet § Environ a !eneration the activation record, By this the length of the activation record of calle procedures jg known to caller at compile time. oe 4 Control ink | het { Pointer to x record for A Pointer to y Arrays of A Act {| Control link record for B top_sp—e === Place variable length ; arrays of B Top! Fig. 4.5.4 Storing variable length data Review Questions 1. What is an activation record ? Explain the purpose of each item in the activation record with | example. | What is activation record ? Explain its possible structure. | | 3, Explain the runtime storage scheme for C++ language. Give the structure of activation record | N and explain with suitable example. 4. What do you mean by calling sequence ? Explain the action performed during i) Function call | and ii) Return EMG Parameter Passing ro There are two types of parameters. i) Formal parameters ii) Actual parameters TECHNICAL PUBLICATIONS” - an up-thrust for knowledge Pe conve De eee ak Run-time Environment and Code Generation and based on these parameters there are various parameter passing methods, the jst common methods are, ow 1. Call by value : This is the simplest method of parameter passing. The actual parameters are evaluated and their r-values are passed to called procedure, The operations on formal parameters do not changes the values of actual parameter. Example : Languages like C, C++ use actual parameter passing method. In PASCAL the non-var parameters. 2. Call by reference : This method is also called as call by address or call by location. The L-value, the address of actual parameter is passed to the called routines activation record. « The values of actual parameters can be changed. The actual parameters should have an L-value. Example : Reference parameters in C++, PASCAL'S var parameters. Copy restore : This method is a hybrid between call by value and call by reference. This method is also known as copy-in-copy-out or values result. The calling procedure calculates the value of actual parameter and it is then copied to activation record for the called procedure. » During execution of called procedure, the actual parameters value is not affected. If the actual parameter has L-value then at return the value of formal parameter is copied to actual parameter. Example : In Ada this parameter passing method is used. 4. Call by name : This is less popular method of parameter passing. Procedure is treated like macro. The procedure body is substituted for call in caller with actual parameters substituted for formals. The actual parameters can be surrounded by parenthesis to preserve their integrity. The locals names of called procedure and names of calling procedure are distinct Example : ALGOL uses call by name method. For example : Consider the following, piece of code along with the output. Procedure exchange (m,n:integer); Var t : integer; begin t m a; TECHNICAL PUBLICATIONS” - an up-thrust for knowledge Compiler Design 4-14 Runstime Environment and Code Generation end; ijt ali] := 50 { a:array [1....10] of integer} print (iali}); exchange(i,alil); print (i,a{1}); The output for all the above discussed method is Call by value Call by Copy restore Call by name reference 1 50 1 50 1 50 1 50 50 1 Error The error occurs because t 1 i 50 as afi] = 50 ali] a[50] = t = 1 then index is out of bounds. What is the output of the following C program, if the compiler uses dynamic scope ? Briefly justify your answer. intr; void write (void) { printf’ sea’, 1) } void Display (void) { int r = 37.24; write (); void main (void) { r= 11.34 write (); Display ( ); } Solution : |The output with dynamic scope will be 11.34 37.24 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Pe in copier Des! 4-15 Run-time Environment and Code Generation behaves as a global variable and hence value 11.34 will be printed. In the nitially © ‘ope and its value gets changed to 37.24, And it ppapay function variable has a tocal re printed in Display function itself, Hence 11.34 97.24 is the output Write the output of the following C program using following parameter passing methods. i) Call by value. ii) Call by reference. iti) Call by value-result and iv) Call by name. #include int i = 0; int j = 0; void p(int x, int y) { } void swap(int x, int y) { xa x+y; oa xK=XY; } main() int a[2] = {1, 1}; int b[3] = {1, 2, 0}; (ali, ali); printf("%d, %d\n", a[0], a[1}); swap(, ali); printf(%d,%d, %d\n", b{O], b/1), bI2I); return 0; . Solution: i) Call by value When P(afil, afi] ) is called then P(1,1) td P(xy) In the function body P, x and y will be increment. Hence x = 2,i=2,y = 2. But on returning to main, the x and y values will not be preserved. * printf (" %d %d", a0], a[1] ) will result as 11 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Compiler Design 4-16 Rumtime Environment and Code Generation When call to swap ( j, alj] ) given it will be vad swap (x, y) 44 ‘swap (0, 1) In the function body of x xa x+y ie x = 0+1=1 yax-y _y =1-1=0 x =x-y x= 1-0 x= x =1,y =0. The x and y values get swapped. But on returning to main, the and y values do not get preserved. The printf statement is for printing b[0] and b[2] values. Therefore we will get 12 0. Hence output will be ii) Call by reference In call by reference, the formal parameters get associated with actual parameter. void p(int x, int y) & ¥a=1 /* x becomes 2, y becomes 2 */ : it=1 Pinky “-1 /* y becomes 3 */ } When control returns to main a[0] = x = y = 3 and af1] remains as it is ie. 1. After calling swap function the printf statement is printing b[0], b[1] and b[2] ie. 1, 2,0. Hence output will be, 3.1 120 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ye . snpier DOS” : 4-97 Run-time Environment and Code Generation jg Call by value result iil i pl yaw result the actual parameter applied by the caller is copied into aallee’s formal parameter the function executes and then formal parameter is copied back 2 aller. No alias or reference is created between formal and actual parameters. Hence oid paint x, int y) { alo] >¥ ra x+=1 x=1+1=2 yt+=l yoiti ie. i 2 ie. ' dn returning to main value of x gets copied to a[0] -.a[0] = 2. The all] remains 1. After swap function the printf prints b[0], b[1] and b[2]. Hence output will be, 21 720 iv) Call by name In this method the actual parameters will be substituted for all the occurrences of formal parameter Hence, plint x, int y) P(a(O}, aft] ) { becom: } } «. On returning from P we get 2, 2 as an output of first printf. After calling swap, the printf prints b[0], b[1] and b[2]- Hence output will be, 22 120 ew Question er in a function with example. 1. Explain about various ways to pass a paramet Ue ed TECHNICAL PUBLICATIONS” - an up-thrust for knowledge Rumetime Environment and Code Generatigy Compiler Design 4-18 ddtabults mu Symbol Tables rE SEC Definition compiler to keep track of semantics of Symbol table is a data structure used by ¢ information about scope and binding variable. That means symbol table stores the information about names. * Symbol table is built in lexical and syntax analysis phases. The symbol table is used by various phases as follows - Semantic analysis phase refers symbol table for type conflict issue. Code generation refers symbol table knowing how much run-time space is allocated ? What type of run-time space i allocated ? \-value and r-value The | and r prefixes come from left and right side assignment. For example : ao:s itt t of I-value —r- value Symbol-Table Entries * To achieve compile time efficiency compiler makes use of symbol table. It associates lexical names with their attributes. The items to be stored in symbol to table are : i) Variable names ii) Constants iii) Procedure names v) Literal constants and strings vii) Labels in source languages © Compiler uses following types of information from symbol table. i) Data type ii) Name iii) Declaring procedures iv) Offset in storage If structure or record then pointer to structure table iv) Function names vi) Compiler generated temporaries v) vi) For parameters, whether parameter passing is by value or reference ? vii) Number and type of arguments passed to the function viii) Base address ‘i ® TECHNICAL PUBLICATIONS” - an up-thrust for knowledge foe Run-time Environment and Code Generation How to Store Names in Symbol Table ? There are two types of name representation. ' rixed-length name ed space for each name is allocated in symbol table. In this type of storage if A fix ame is too small then there is wastage of space. for example © : Name :, Attribute Geoael ¢ u Ta’ stage a ee ‘The name can be referred by pointer to symbol table entry. 2. Variable length name The amount of space required by string is used to sto The name can be stored with the help of starting index and lengt re the names. th of each name. For example : Name Starting Length attcatee index 0 10 3 = oo Be 10 4 a s . 14 2 _ ie “aes 16 S Otra uegeriget a a6 7 YB , WNnNRB WS 6 C6 heh ame Go ee Ga Symbol Table Management Requirements for symbol table management : ') For quick insertion of identifier and related information ') For quick searching of identifier TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Compiler Design 4-20 Rumtime Environment and Code Generation Following are commonly used data structures for symbol table construction. 1. List data structure for symbol table . Linear list is a simplest kind of mechanism to implement the symbol table. In this method an array is used to store names and associated information. New names can be added in the order as they arrive. The pointer ‘available’ is maintained at the end of all stored records. The Fig. 4.7.1 for list data structure using arrays is as given below. To retrieve the information about some name we start from beginning of array : and go on searching up to available gq cyaeny sioh pointer. If we reach at pointer available Fig. 4.7.1 Symbol table without finding a name we get an error “use of undeclared name". While inserting a new name we should ensure that it should not be already there. If it is there another error occurs i.e. "Multiple defined Name". The advantage of list organization is that it takes minimum amount of space. 2. Self organizing list This symbol table implementation is using linked list. A link field is added to each record. We search the records in the order pointed by the link of link field. A_ pointer maintained to point to first record of the symbol table. The reference to these names can be Name 3, Name 1, Name 4, Name 2. Fig. 4.7.2 Symbol table When the name is referenced or created it is moved to the front of the list. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge 4 21 Run-time Environment and Code Generation one most frequently refer ‘d names will tend to be a ie erred names will be the front of the list. Hence access > most frequently ret time ¢ least tables 3 Hest Hashing is an important technique used to search the records of symbol table. This method is superior to list organization. + In hashing, scheme two tables are maintained-a hash table and symbol table. + The hash table consists of k entries from 0, 1 to k-1. These entries are basically pointers to symbol table pointing to the names of symbol table. | « To determine whether the ‘Name’ is in symbol table, we use a hash function ‘h’ such that hiname) will result any integer between 0 to k-1. We can search any name by position = h(name) Using this position we can obtain the exact locations of name in symbol table. «+ The hash table and symbol table can be as shown below - ‘Symbol table Name Info Hash link Hash table Fig. 4.7.3 Hashing for symbol table The hash function should result in uniform distribution of names in symbol table. The hash function should be such that there will be minimum number of collision. Collision is such a situation where hash function results in same location for stonin, % the names. TECHNICAL PUBLICATIONS” - an up-thrust for knowledge ec CC SEEE'SSSCS Seo Compiler Design 4-22. Rumtime Environment and Code Generatign © Various collision resolution techniques are open addressing, chaining, rehashing © The advantage of hashing is quick search is possible and the disadvantage is that hashing is complicated to implement. Some extra space is required Obtaining ope of variables is very difficult Draw the symbol tables for each of the procedures in the following PASCAL, code (including main) and show their nesting relationship by linking them via a pointer reference in the structure (or record) used to implement them in memory. Include the arguments and any other information you find and location at run-time, entries or fields for the local variabl relevant for the purposes of code generation, such as its type 01: procedure main 02: integer a, b, ¢; 03: procedure fl (a, b); 04: integer a, b; 05: call £2(b, a); 06 end; 07: producer f2(y, z); 08: integer y, 2; | 09: procedure £3(m, n); | 10: integer m, n; | 11: end; 12: procedure f4(m, n); 13: integer m, n; 14: end; 15: call £3(c, 2); 16: call f4(c, 2); 17: end; 18: 19: call fl(a, b); Lo eee 20: end; Solution : Representation of symbol tables and their nesting relationships by. linking them is as shown in Fig. 4.7.4 (See Fig. 4.7.4 on Next Page). TECHNICAL PUBLICATIONS® - an up-thrust for knowledge a compiler Desig” _ ae 4-23 Run-time Environment and Code Generation | parent @ ] type | size are integer] 4 f+ integer] 4 woger| 4 name: f1 parent & name: parent & name: {4 parent & ‘ind [symbol] type | size kind [kymbol| type [ size ind [symbol] type | size param] w_|integer| 4 var_[ a _finteger| 4 param] |integer| 4 parem| _x__|integer| 4 param| _y_|integer param| 2 integer] 4 name: (3 parent e—| kind [symboll type | size var |b |integer| 4 param| m_|integer| 4 param] integer] 4 Fig. 4.7.4 Review Questions 1. Define symbol table. 2. How names can be looked up in the symbol table ? Discuss. Suggest a suitable approach for computing hash function. Discuss various ways of implementing symbol table and compare merits and demerits. CUR re ee 5. Suggest a suitable representation for symbol tables of procedure oriented languages. (Ans. : Refer point 3. Hash tables ) Oe 6. How would-you map names to values ? COs Oe mH Dynamic Storage Allocation There are two techniques used in dynamic memory allocation and those are - * Explicit allocation © Implicit allocation Explicit memory allocation and deallocation is done with the help of some functions such as new and dispose. Whereas implicit memory allocation is done by compiler “sing run time support packages. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Rum-tin Environment and Code Generation Compiler Design am EERE Explicit Allocation | iable sized blocks The explicit allocation can be done for fixed size and variable sized bloc EARS Explicit Allocation for Fixed Size Blocks + This is the simplest technique of explicit allocation in which the size of the block tor which memory is allocated is fixed Free list is a set of free blocks. This list is * In this technique a free list is used. 1 7 e1 is i tec e] observed when we want to allocate memory. If some memory is de-allocated then the free list gets appended. The block are linked to each other in a list structure. The memory allocation can be done by pointing previous node to the newly allocated block. Similarly memory deallocation can be done by de referencing the previous link. . The pointer which points to first block of memory is called Available. Available fet ti Available Fig. 4.8.1 Deallocate block '30' * This memory allocation and deallocation is done using heap memory. * The explicit allocation consists of taking a block off the list and deallocation consist of putting the block back on the list. * The advantage of this technique is that there is no space overhead. Explicit Allocation of Variable Sized Blocks ¢ Due to frequent memory allocation and deallocation the heap memory becomes fragmented. That means heap may consist of some blocks that ate free and some that are allocated. * In Fig. 4.8.2 a fragmented heap memory is shown. Suppose a list of 7 blocks gets allocated and second, forth and sixth block is deallocated then fragmentation occurs. Thus we get variable sized blocks that are available free. Allocated block Fig. 4.8.2 Hi 'P Memory (Fragmented) —— eee ee > 8n up-thrust for knowledge TECHNICAL PUBLICATIONS® 4-25 Run-time Environment and Code Generation For allocating variable sized blocks some strategies such as first fit, worst fit and best re used. When a block of size $ is allocated and if we search for the free block in fit a is used then that the heap memory and the first available free block whose size > S rechnique is called first fit method. Sometimes all the free blocks are collected together to form a large ultimately avoids the problem of fragmentation. pe Implicit Allocation The implicit allocation is performed using user program and runtime packages. free block. This Block size (optional) The run time package is required to know Reference count (optional) when the storage block is not in use. Mark (optional) Pointers to blocks rs ae ‘User data The format of storage block is as shown in following Fig. 4.8.3. l] There are two problems that are faced in implicit memory allocation - First problem is recognizing block boundaries. If the block size is fixed then user data can be easily accessible. Second problem is to determine which block is in use. ‘There are two approaches used for implicit allocation. count is a special counter used during implicit d by some another block then its Fig. 4.8.3 Block format 1) Reference count - Reference location. If any block is referre: ne, That also means if the reference count of that means that block is not referenced one ence counts are best used when pointers memory all reference count incremented by ©) particular block drops down to 0 then, and hence it can be deallocated. Refer: between blocks never appear in cycle. This is an alternative approach to determine whether. the his method, the user program is suspended temporarily to mark the blocks that are in use. Sometime bitmaps k the blocks which are in use. A bit-table stores blocks which are in use currently. 2) Marking techniques - block is in use or not. In # and frozen pointers are used are used. Bitmaps are used to mar all the bits. This table indicates the laced in locks which are unused. These pointers are then p' he heap memory. Again we go through heap memory and mark those b technique it is possible to keep track of the blocks that are in Thus using marking use. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Compiler Design 4-26 Run-time Environment and Code Generation * There is one more technique called compaction in which all the used blocks are moved at the one end of heap memory, so that all the free blocks are available in ‘one large free block. Review Question 1. Explain about implicit and explicit storage requests. Part II : Code Generation GZ Concept of Code Generation As we know, code generation is the final phase in the process of compilation. It takes intermediate code as an input and generates target machine code as output. The position of code generator in compilation process is illustrated by following Fig. 49.1. Intermediate Front end Fig. 4.9.1 Position of code generator in compiler the Design of a Code Generator Cee Let us discuss some common issues in design of code generator. * Input to the code generator The code generation phase takes intermediate code as input. This intermediate code along with the symbol table information is used to determine the runtime addresses of the data objects. These data objects are denoted by the names in the intermediate representation. The intermediate code may be in any form such as three address code, quadruple, posix notation or it may be represented using graphical representations such as syntax trees or DAGs. The intermediate code generated by the front end should be such that the target machine can easily manipulate it, in order to generate appropriate target machine code. In the front end of compiler necessary type checking and type conversion needs to be done. The detection of the semantic errors should be done before TECHNICAL PUBLICATIONS® - an up-thrust for knowledge > yer Design 4-27 Run-time Environment and Code Generation ing the input to the code generator. The code generation phase requires the eubmitt : _ ror free intermediate code as input. complete er «Target programs The output of code generator is target code. Typically, the target code comes in three forms such as : absolute machine language, relocatable machine language and assembly languase The advantage of producing target code in absolute machine form is that it ¢ placed directly at the fixed memory location and then can be executed immediately. The benefit of such target code is that small programs can be quickly compiled. an be for example: The WATFIV and PL/C are the compilers which produce the absolute code as output. The advantage of producing the relocatable machine code as outpu subroutines can be compiled separately. Relocatable object modules cai together and loaded for execution by a linking loader. This offers great flexibility of compiling the subroutines separately. If the target machine can not handle the relocation automatically then the compiler must provide explicit relocation information to the loader in order to link the separately compiled subroutine segments. The advantage of producing assembly code as output makes the code generation process easier. The symbolic instructions and Macrg facilities of assembler can be used to generate the code. It is advantageous to have assembly language as output for the machines with small memory. it is that the in be linked However, out of these three forms of output target code it is always preferable to have relocatable machine code as target code. * Memory management Both the front end and code generator performs the task the data objects in run time memory. The names + to the entries in the symbol table. The type in a f storage (memory) needed to store the memory requirements of mapping the names in the source program to addresses to used in the three address code refer declaration statement determines the amount o! declared identifier. Thus using the symbol table information about code generator determines the addresses in the target code. tains the labels then those labels can be Similarly if the three address code co’ es. For instance if a reference to ‘goto j’ is converted into equivalent memory address encountered in three address code then ap by computing the memory address for label j. propriate jump instruction can be generated * Instruction selection The uniformity and completeness of instruction se 8enerator. The selection of instruction depends uy Machine. The speed of instruction and machine idioms are two important factors in t is an important factor for the code pon the instruction set of target TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Compiler Design 4-28 Run-time Environment and Code Generation selection of instructions. If we do not consider the efficiency of target code then the instruction selection becomes a straightforward task. For each type of three address code, the code skeleton can be prepared which ultimately gives the target code for the corresponding construct. For example x:=a+b then the code sequence that can be generated as MOV a,RO /* loads the a to register RO */ ADD b,RO_/* performs the addition of b to RO */ MOV ROx /* stores the contents of register RO to x */ In the above example the code is generated line by line. Such a line by line code generation process generates the poor code because the redundancies can be achieved by subsequent lines and those redundancies can not be considered in the process of line by line code generation. : For example The code for the above statements can be generated as follows : MOV y, RO ADD z, RO | MOV RO, x : | MOV x,RO | ADD t,RO { MOV RO,a The above generated code is a poor code because MOV RO,a is not used and statement MOV a,RO is redundant. Hence the efficient code can be MOV y, RO ADD z, RO ADD t,RO MOV RO,a The quality of generated code is decided by its speed and size. Simply line by lin translation of three address code into target code leads to correct code but it can generate unacceptably non efficient target code. «Register allocation If the instruction contains register operands then such a use becomes shorter and faster than that of using operands in the memory. Hence while generating a good os efficient utilization of register is an important factor. There are two important activities done while using registers. 1. Register allocation - During register allocation, select appropriate set of variables that will reside in registers. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge > 1 Design compile 4-20 Run-time Environmont and Gode Generation agister assignment - During re; >, Registe ie During register assignment, pick up the specific register in which corresponding variable will reside obtaining the optimal (minimum) assignment of registers to variable is difficult. Certain machines require register pairs such as even odd numbered registered for come operands and results, For example in IBM systems integer multiplication requires register pair Consider the three address code tysatb t= t"e t= tr/d The efficient machine code sequence will be MOV a,RO ADD b,RO MUL c,RO DIV d,RO MOV RO,t, « Choice of evaluation order The evaluation order is an important factor in generating an efficient target code. Some orders require less number of registers to hold the intermediate results than the others. Picking up the best order is one of the difficulty in code generation. Mostly, we can avoid this problem by referring the order in which the three address code is generated by semantic actions. * Approaches to code generation The most important factor for a code generation is that it should produce the correct code. With this approach of code generation various algorithms for generating code are designed 1. Briefly explain the issues in design of code generator TA en eC eens ES Ue Dec.-10, Marks 10, Dec.-18, Marks 13 2 What are the issues to be handled during code generation ? Explain how these issues are handled. Give suitable examples to support your explanation TM ee 10 design of code generator. ‘AU : May-18, Marks 6, Dec.-22, Marks 7 3. Summarize the issues arise during the TECHNICAL PUBLICATIONS” - an up-thrust for knowledge Compiler Design 4-30 Run-time Environment and Code Generation Target Machine Description For designing the good code generator it is necessary to have prior knowledge of target machine and instruction set used for this target machine. In this chapter we will assume that the target machine code is a register machine like minicomputers. Specifically following assumptions are made for code generation. * We will assume that in the target computer addresses are given in bytes and four bytes form a word. There are n general purpose registers RO, R1,....Rn-1. * The two address instruction is of the form op source destination where op is an opcode and source and destination are data fields. For instance : MOV - Moves from source to destination. ADD - Add source to destination. SUB - Subtracts source from destination. The source and destination are specified by registers and memory locations. © The addressing modes used are as follows. Addressing mode Form Address Added cost absolute M M 1 register R R 0 indexed (R) c+ contents(R) 1 indirect register RK contents(R) 0 indirect indexed ¥c(R) __contents(c+contents(R)) 1 literal He c 1 If we have absolute or resister addressing mode we can use M or register for source or destination. For example, the instruction MOV RIM stores the contents of register R1 into memory location M. For the indexed addressing mode the address offset ¢ from the value of register RO can be written as, MOV 7(R1), M TECHNICAL PUBLICATIONS” - an up-thrust for knowledge 4-31 jer Design Run-time Environment and Code Generation goons it stores he value contents (7+-contents(R1) to the memory location M sre last v0 addressing modes represent the indirect addresses indicates by * e instruction MOV #7(R0),M For thi jp stores the value contents(contents (7+ contents(R0)) to the memory location M, inthe literal addressing mode the source becomes constant. For example, MOV #5,R0 By this instruction we can store the constant 5 into register RO. pel Cost of the Instruction The instruction cost can be computed as one plus cost associated with the source and destination addressing modes given by “added cost”. Instruction Cost Interpretation MOV ROR1 1 Cost of register mode +1=0+1=1 MOV RIM 2 Use of memory variable +1=1+1=2 SUB 5(RO)*10(R1) 3 Use of first constant +use of second constant +1 =3 ee ee Lars) Compute the cost of following set of instructions. MOV a,RO ADD b,RO MOV R0,c Solution : The cost of this set is 6 because - MOV a,RO 2 ADD b,RO 2 MOV RO,c 2 ee ea Total cost 6 URC UmS CO Coo RRIRREED compute the cost of following set of instructions MOV +R1,+RO ADD +*R2,*RO ee Oe TECHNICAL PUBLICATIONS” - an up-thrust for knowledge Compier Design 4-32 Rumtime Environment and Code Generation Solution : The cost of this set is 2 because - MOV *R1,*RO 1 ADD *R2*RO 1 Total cost 2 Review Questions 1 What are various addressing modes used during the code generation ? uction ? AU : Dec.-04, 10, May-05, Marks 2 | Design of a Simple Code Generator AU : May-10,11,12,13,15,16,18, Dec.-06,07,09,11,14,22, Marks 16 2. How do vou calculate the cost of an ins In this section we will discuss the method of generating target code from three address statement. ¢ In this method computed results can be kept in registers as long as possible. For example X= a+b; The corresponding target code is ADD bR; Here R; holds value of a Here cost =2 OR MOV bR; Here Rg holds value of a ADD Ry,Ro Here cost =2 * The code generator algorithm uses descriptors to keep and addresses for names. 1. A register descriptor is used to keep track of what is currently The register descriptors show that initially all the registers code generation for the block progresses computations. track of register conten in each register, are empty. As the registers will hold the values nN The address descriptor stores the location where the current value of the nam can be found at run time. The information about locations can be symbol table and is used to access the variables. + The modes of operand addressability are as given below. is used to indicate value of operand in storage. stored in t R is used to indicate value of operand in register. TECHNICAL PUBLICATiON<® Me sign onpior DOS! 4-33 Run-time Environment and Goda Generation Attributes: ghee) Storage mode Location / Register SS a ~~ Fig. 4.12.1 Address descriptor is indicates that the address of operand is stored in storage i.e. indirect accessing ndicates that the address of operand is stored in register i.e. indirect accessing. IRi «The address descriptor has following fields. The attributes mean type of the operand. It generally refers to the name of temporary variables, The addressing mode indicates whether the addresses are of type The third field is location field which indicates whether the address i: jocation or in register. «Similarly the register descriptor can be shown as below. Operand Status descriptors oO ee TT Ld Fig. 4.12.2 Register descriptor of the registers which are currently is used to check whether the register tus field holds the value ‘True’ then he operand descriptor who is having 19//R'/1S‘/IR’- s in storage By using register descriptors we can keep track occupied. The status field is of Boolean type which is occupied with some data or not. When the stat operand descriptors fields contains the pointer to # the latest value in the register. Algorithm for code generation Read the expression in the form of Operator, operand1 and operand2 and generate code using following algorithm - {i coeloporanasioperandiepéran2) ee ='R’) if (operator = ‘+') Generate(‘ADD operand2,R0'); else if(operator = ‘-') Generate(‘SUB operand2,R0'); else if(operator = ** Generate(‘MUL operand2,RO0'); else if(operator = ‘/') Generate(‘DIV operand2,R0’); Se TECHNICAL PUBLICATIONS - an up-thrust for knowledge Compiler Design 4-34 Run-time Environment and Code Generation } { else if (operand2.addressmode = 'R') if (operator = ‘+') Generate(‘ADD operand! RO"); else if(operator = ‘—' Generate(‘SUB operand1,R0'); else if(operator = ‘*') Generate(‘MUL operand 1,R0'); else if(operator = ‘/") Generate(‘DIV operand1,R0’); } else { Generate(‘MOV operand2,R0); if (operator = ‘+!) Generate(‘ADD operand2,R0’); else iffoperator = '-') Generate(‘SUB operand2,R0’); else if(operator =''*’) Generate(‘MUL operand2,R0’); else if(operator = ‘/’) Generate(‘DIV operand2,R0'); } Example We will generate code for following expression = (a + b) * (c — d) + ((e/f) * (a + b)) The corresponding three address code can be given as t) = atb t= cd = e/f ty ety ty * ty tg = ty + ty Using the simple code generation algorithm the sequence target code can be generated as Three address code Target code Register Operand descriptor sequence descriptor ty = atb MOV a, Ro Empty i . . ADD b, Ry Ro contains t, ut x ea bred MOV c, Ry Ry contains ¢ »)RI® SUB d, Ry Ry contains ty : TECHNICAL PUBLICATIONS® ~ an Up-thrust for knowledge r - pier 00510" 4-35 Run-time Eruronment sod Oda Gora were | MOV e, R5 Ry contains ¢ 3 DIV f, R; Ry contains t, . MUL Ro, Ry Rg contains t, Ry contains ty ‘ Rj contains ty att MUL R2Ro Ry contains ty ts" ; 7 Ry contains ty 5 9 Rg contains t; +ts ADD R;.Rg Rj contains ty 7 Ro contains ts, ‘6 Rg contains ty Generate code for the following statements for target machine xv=x+1 2) x=atb+c 3) x=ay/(b-c)-d*(e+f) Ee Solution: 1) The three address code will be th: = x41 x:=t The code will be MOV x, Ro ADD #1, Ro MOV Ry x 2) The three address code will be th: = be this att x:=t The code will be - MOV a, Ro ADD b, Ro ADD c, Ro MOV Ry, x 3) The three address code will be ty: = boc ty: = ay/th =. = TECHNICAL PUBLICATIONS” - an up-thrust for knowledge Compiler Design 4-36 Run-time Environment and Code Generation b= h-d e+f Bit Xie ty The code will be - MOV b, Ry SUB c, Ry MOV a, R, DIV Ry Ry SUB d, R, MOV f, R, ADD e, Ry MUL R,, Ry MOV R,, x GEITIILELE) Generate a code for following statements a=btc; d=ate; Solution; The code will be MOV b, RO ADD c, RO MOV RO, a| dundant code, it can be removed MOV a, RO ADD e, RO MOV RO, d Then we get, MOV b, RO ADD c, RO ADD e, RO MOV RO, d Seem Une ud EOP REED Generate assembly language for W:= (A+B)+(A+O)+(A+C) Solution : The three address code can be written as - t}: = A+B A+C thr = th tty ty th +t, W: = ty The assembly code will be - MOV A, Ry ADD B, Ry /*A+B*/ TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ea compiler DS!9” — ___ 4-37 __ Run-time Environment and Code Generation MOV A, R, ADD G Re 7A ce), ADD RyRy / (A+B) +(A+C) and Ro contains result * / Now fan /* (A+B) + (A +0) + (A + C), result is in Ry * / PEELED For the following expression obtain optimal code using ;) only twvo registers ii) only one register (a+b) -~(c- (+e) First of all we will write a three address code for given expression a+b dte cnt ty ty -t3 ‘The generated code using simple code generation algorithm will be id MOV d, RO ADD e, RO MOV c, R1 SUB RO,R1 MOV a, RO ADD b, RO SUB R1, RO ii) MOV a, RO ADDe, RO MOV RO, t1 MOV c, RO SUB t1, RO MOV RO, t2 MOV a, RO ADD b, RO SUB t2, RO GRBEPREGR Generate code for the following C program using any code generation igor you known. Solution = nun main () { int j; int a[10); while (j <=10) ali] = 0; } Solution : Before writing the actual machine code for given fragment of code it is essential to write a three address code. 1. ifj < = 10 goto 3 2 goto 7 3 taped TECHNICAL PUBLICATIONS® - an up-thrust for knowledge 4-38 Run-time Environmont and Code 4. ty = addr (a) 5. ty I\h=o 6 goto | - stop, Now the machine code can be generated as follows L1: CMP j, 10 CJ < =12 L2: MOV # 4, RO MUL j, RO MOV #0, A(RO) JMP L1 UCR SLSY Generate optimal code for following assignment statements. X= atb+e X= (a*—b) + (c-(d +e) x = (a/b ~¢) fd 7 2ompiler Design = at (b+e / dre) /(f * g —h* i) Solution : i) x = a+b+c ii) iii) th:= atb t2:= tte The code will be MOV a, RO ADD b, RO ADD c, RO MOV RO, x t1 dte t2 c-t1 t3 -1*b t4 a*t3 tS t3+t2 The code will be MOV d, RO ADD e, RO <—dte MOV ¢, R1 RI contains c SUB RO, R1 RI contains c —(d+e) MOV b, RO MUL -1, RO MUL a, RO RO contains ax —b ADD RO, R1 MOV R1, x tl = a/b t2 tl-c t3 ta/d “ATIONS 1 : ECHNICAL PUBLIC, - an up-thrust for knowledeae = n compiler Desig! 4-39 Run-time Environment and Code Generation The code will be MOV b, RO DIV b, RO SUB c, RO MOV d, Ri DIV RO,R1 MOV RI, x at(b+c/dte) / (f * a —h * i) 1 iv) ee The code will be - MOV f, RO MUL g, RO

You might also like