Compiler QBank From CD

Anna University
Solved Question Papers

B.E./B.Tech. 6th Semester
Computer Science and
Engineering
Chennai
FM.indd 1
Delhi
4/27/2014 6:01:29 PM
Copyright 2015 Pearson India Education Services Pvt. Ltd

This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, resold,
hired out, or otherwise circulated without the publishers prior written consent in any form of binding or
cover other than that in which it is published and without a similar condition including this condition being
imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no
part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in
any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior
written permission of both the copyright owner and the publisher of this book.
ISBN 978-93-325-4246-4
First Impression
Published by Pearson India Education Services Pvt. Ltd, CIN: U72200TN2005PTC057128, formerly
known as TutorVista Global Pvt. Ltd, licensee of Pearson Education in South Asia.
Head Office: 7th Floor, Knowledge Boulevard, A-8(A), Sector 62, Noida 201 309, U.P., India.
Registered Office: Module G4, Ground Floor, Elnet Software City, TS-140, Blocks 2 and 9,
Rajiv Gandhi Salai, Taramani, Chennai 600 113, Tamil Nadu, India. Fax: 080-30461003,
Phone: 080-30461060, www.pearson.co.in, Email: companysecretary.india@pearson.com.
Semester-VI
Principles of
Compiler Design
02-Priniciples of complier Design-CSE.indb 1
12/13/2012 5:14:18 PM
The aim of this publication is to supply information taken from sources believed to be valid and
reliable. This is not an attempt to render any type of professional advice or analysis, nor is it to
be treated as such. While much care has been taken to ensure the veracity and currency of the
information presented within, neither the publisher nor its authors bear any responsibility for
any damage arising from inadvertent omissions, negligence or inaccuracies (typographical or
factual) that may have found their way into this book.
EEE_Sem-VI_Chennai_FM.indd iv
12/7/2012 6:40:43 PM
B.E./B.TECH. DEGREE EXAMINATION,

MAY/JUNE 2012
Sixth Semester
Computer Science and Engineering
(Common to Information Technology)

CS 2352 /CS 62/ 10144 CS 602 PRINCIPLES OF
COMPILER DESIGN
(Regulation 2008)
Time: Three hours
Maximum: 100 marks

Answer All Questions
PART A (10 2 = 20 marks)

1. Mention few cousins of the compiler.
2. What are the possible error recovery actions in lexical analyzer?
3. Dene an ambiguous grammar.
4. What is dangling reference?
5. Why are quadruples preferred over triples in an optimizing compiler?
6. list out the motivations for back patching
7. Dene ow graph.
8. How to perform register assignment for outer loops?
9. What is the use of algebraic identities in optimization of basic blocks?
10. List out two properties of reducible ow graph.
PART B (5 16 = 80 marks)
11. (a) (i) What are the various of the compiler? Explain each phase in
detail
(10)
(ii) Briey explain the compiler construction tools.
(6)
12/13/2012 5:14:18 PM
2.4
B.E./B.Tech. Question Papers
Or
(b) (i) What are the issues in Lexical analysis?
(ii) Elaborate in detail the recognition of tokens.
(4)
(12)
12. (a) (i) Construct the predictive parser for the following grammar:
S (L)/a
L L, S/S
(10)
(ii) Describe the conicts that may occur during shift reduce parsing
(6)
Or
(b) (i) Explain the detail about the specication of a simple type
checker
(10)
(ii) How to subdivide a runtime memory in to code and data areas.
Explain.
(6)
13. (a) (i) Describe the various types of three address statements.
(8)
(ii) How names can be looked up in the symbol table? Discuss. (8)
Or
(b) (i) Discuss the different methods for translating Boolean expressions in detail.
(12)
(ii) Explain the following grammar for a simple procedure call statement. S->call id(Elist)
(4)
14. (a) (i) Explain in detail about the various issues in design of code
generator
(10)
(ii) Write an algorithm to partition a sequence of three address statements into basic blocks.
(6)
Or
(b) (i) Explain the code generation algorithm in detail.
(ii) Construct the DAG for the following basic block
d: = b*c
e: a+b
(8)
(8)
12/13/2012 5:14:18 PM
Principles of Compiler Design (May/June 2012)
2.5
b: = b*c
a:= e-d
14. (a) (i) Explain the principle sources of optimization in detail.
(ii) Discuss the various Peephole optimization in detail.
(8)
(8)
Or
(b) (i) How to trace the data ow analysis of structured program?
Discuss.
(6)
(ii) Explain the common sub expression elimination, copy propagation and transformations for moving loop invariant computations in detail.
(10)
12/13/2012 5:14:18 PM
Solutions
PART A
1. Cousins of compiler means the context in which the compiler typically
operates. Such contexts are basically the programs such as Preprocessor,
assemblers, loader and link editors.
2. a.
b.
c.
d.
Deleting an extraneous character

inserting a missing character
replacing an incorrect character by a correct character
transposing two adjacent characters
3. A grammar that produces more than one parse for some sentence is said to
be ambiguous grammar.
Example Given grammar G: E E+E | E*E | (E) | - E | id
The sentence id+id*id has the following two distinct leftmost derivations:
E E+ E
E E* E
E id + E
EE+E*E
E id + E * E
E id + E * E
E id + id * E
E id + id * E
E id + id * id
E id + id * id
4. If a heap variable is destroyed, any remaining pointer variable or object
reference that still refers to it is said to contain a dangling reference. Unlike lower level languages such as C, dereferencing a dangling reference
will not crash or corrupt your IDL session. It will, however, fail with an
error message.
For example:
; Create a new heap variable.
A = PTR_NEW(23)
; Print A and the value of the heap variable A points to.
PRINT, A, *A
IDL prints:
<PtrHeapVar13> 23
5. In the quadruple representation using temporary names the entries in
the symbol table against those temporaries can be obtained
The advantage with quadruple representation is that one can quickly
access the value of temporary variables using symbol table . use of temporaries introduces the level of indirection for the use of symbol table
in quadruple representation
12/13/2012 5:14:18 PM
2.7
Whereas ,in triple representation the pointers are used , by using pointers one can access directly the symbol table entry
6. To overcome the problem of problem of processing the incomplete information in one pass the backpatching technique is used
7. A ow graph is a directed graph in which the ow control information
is added to the basic blocks.
(i) The nodes to the ow graph are represented by basic blocks
(ii) The block whose leader is the rst statement is called initial block.
(iii) There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the given sequence. We can say that B1 is a
predecessor of B2.
8. Consider that there are two loops L1 is outer and L2 is an inner loop. And
allocation of variable a is to be done to some register . the approximate
scenario is as given below:
Loop L1
...
Loop L2
} L1L2
} L1L2
Following criteria should be adopted for register assignment for outer

loop
1. if a is allocated in Loop L2 then it should not be allocated in L1-L2
2. if a is allocated in L1 and it is not allocated in L2 then store on a entrance to L2 and load a while leaving L2
3. if a is allocated in L2 and not in L1 then load a on entrance of L2 and
store a on exit from L2
9. There is no end to the amount of algebraic simplication that can be
attempted through peephole optimization. Only a few algebraic identities
occur frequently enough that it is worth considering implementing them.
For example, statements such as
x := x+0
x:= x/1
x := x * 1
10. The reducible graph is a ow graph in which there are two types of
edges forward edges and backward edges. These edges have following
properties,
1. the forward edge form an acyclic graph
2. the back edges are such edges whose head dominates their tail.
12/13/2012 5:14:18 PM
2.8
PART B
11. (a) (i) Phases of Compiler
A Compiler operates in phases, each of which transforms the
source program from one representation into another. The
following are the phases of the compiler:
Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
Sub-Phases:
1) Symbol table management
2) Error handling
Lexical Analysis:
It is the rst phase of the compiler. The lexical analysis is called
scanning. It is the phase of compilation in which the complete
source code is scanned and broken up into group of stings called
tokens.
It reads the characters one by one, starting from left to right and
forms the tokens. Token represents a logically cohesive sequence
of characters such as keywords, operators, identiers, special
symbols etc.
Example: position: =initial + rate*60
1.The identier position
2. The assignment symbol =
3. The identier initial
4. The plus sign
5. he identier rate
6.The multiplication sign
7. The constant number 60
Syntax Analysis:
Syntax analysis is the second phase of the compiler. It is also
known as parser. It gets the token stream as input from the lexical analyzer of the compiler and generates syntax tree as the
output.
12/13/2012 5:14:18 PM
2.9
Syntax tree:
It is a tree in which interior nodes are operators and exterior
nodes are operands.
Example: For position: =initial + rate*60, syntax tree is
=
Position +
initial
rate
60
Semantic Analysis:
Semantic Analysis is the third phase of the compiler. It gets input
from the syntax analysis as parse tree and checks whether the
given syntax is correct or not.
It performs type conversion of all the data types into real data
types.
=
Position +
initial
rate
(int to float)
60
Intermediate code generation:

Intermediate code generation gets input from the semantic analysis and converts the input into output as intermediate code such
as three-address code.
This code is in variety of forms as quadruple, triple and indirect
triple .The three-address code consists of a sequence of instructions, each of which has almost three operands.
Example
t1:=int to oat(60)
t2:= rate *t1
t3:=initial+t2
position:=t3
Code Optimization:
Code Optimization gets the intermediate code as input and produces optimized intermediate code as output. This phase reduces
the redundant code and attempts to improve the intermediate
code so that faster-running machine code will result.
12/13/2012 5:14:18 PM
2.10
During the code optimization, the result of the program is not

affected.
To improve the code generation, the optimization involves
deduction and removal of dead code (unreachable code).
calculation of constants in expressions and terms.
collapsing of repeated expression into temporary string.
loop unrolling.
moving code outside the loop.
removal of unwanted temporary variables.
Eample:
t1:= rate *60
position:=initial+t1
Code Generaton:
Code Generation gets input from code optimization phase and
produces the target code or object code as result.
Intermediate instructions are translated into a sequence of machine instructions that
Perform the same task.
The code generation involves
allocation of register and memory
generation of correct references
generation of correct data types
generation of missing code
Machine instructions:
MOV rate, R1
MUL #60, R1
MOV initial, R2
ADD R2, R1
MOV R1, position
Symbol Table Management:
Symbol table is used to store all the information about identiers
used in the program.
It is a data structure containing a record for each identier, with
elds for the attributes of the identier.
It allows to nd the record for each identier quickly and to store
or retrieve data from that record.
Whenever an identier is detected in any of the phases, it is stored
in the symbol table.
12/13/2012 5:14:19 PM
2.11
Error Handling:
Each phase can encounter errors. After detecting an error, a phase
must handle the error so that compilation can proceed.
In lexical analysis, errors occur in separation of tokens.
In syntax analysis, errors occur during construction of syntax
tree.
In semantic analysis, errors occur when the compiler detects
constructs with right syntactic structure but no meaning and during type conversion.
In code optimization, errors occur when the result is affected
by the optimization.
In code generation, it shows error when code is missing etc.
m
Lexical Analyzer
Syntax Analyzer
Symbol Table
Management
Semantic Analyzer
Intermediate Code Generator
Error Detection
and Handling
Code Optimization
Code Generation
Object Program
Fig: Phases of compiler

(ii) These are specialized tools that have been developed for helping
implement various phases of a compiler. The following are the
compiler construction tools:
1. Scanner Generator
2. Parser Generators
3. Syntax-Directed Translation
4. Automatic Code Generators
5. Data-Flow Engines
12/13/2012 5:14:19 PM
2.12
1. Scanner Generator:
These generate lexical analyzers, normally from a specication based on regular expressions.
The basic organization of lexical analyzers is based on nite
automation.
2. Parser Generators:
These produce syntax analyzers, normally from input that is
based on a context-free grammar.
It consumes a large fraction of the running time of a compiler.
Example-YACC (Yet Another Compiler-Compiler).
3. Syntax-Directed Translation:
These produce routines that walk the parse tree and as a
result generate intermediate code.
Each translation is dened in terms of translations at its
neighbor nodes in the tree.
4. Automatic Code Generators:
It takes a collection of rules to translate intermediate language into machine language. The rules must include sufcient details to handle different possible access methods
for data.
5. Data-Flow Engines:
It does code optimization using data-ow analysis, that is,
the gathering of information about
(b) (i) There are several reasons for separating the analysis phase of
compiling into lexical analysis and parsing
1. Simpler design is perhaps the most important consideration.
The separation of lexical analysis often allows us to simplify
one or the other of these phases
2. Compiler efciency improved a separate lexical analyzer allows us to construct a specialized and potentially more efcient processor for the task. A large amount of time is spent
reading the source program and partitioning it into tokens
specialized buffering techniques for reading input characters
and processing tokens can signicantly speed up the performance of a compiler.
3. Compiler portability is enhanced. Input alphabet peculiarities
and other devicespecic anomalies can be restricted to the
12/13/2012 5:14:19 PM
2.13
lexical analyzer. The representation of special or non-standard symbols, such as in Pascal, can be isolated in the lexical analyzer.
(ii) Consider the following grammar fragment:
stmt if expr then stmt
| if expr then stmt else stmt
|
expr term relop term
| term
term id
| num
Where the terminals if , then, else, relop, id and num generate
sets of strings given by the following regular denitions:
if
if
then then
else
else
relop <|<=|=|<>|>|>=
id
letter(letter|digit)*
num digit+ (.digit+)?(E(+|-)?digit+)?
For this language fragment the lexical analyzer will recognize
the keywords if, then, else, as well as the lexemes denoted by
relop, id, and num. To simplify matters, we assume keywords
are reserved; that is, they cannot be used as identiers.
Transition diagrams
It is a diagrammatic representation to depict the action that will
take place when a lexical analyzer is called by the parser to get
the next token. It is used to keep track of information about the
characters that are seen as the forward pointer scans the input.
Transition diagram for identiers and keywords
letter or digit
Start
letter
10
other
11 return(gettoken( ),
install_id( ))
12. (a) (i) As the given grammar is left recursive because of

L L,S/S
We will rst eliminate left recursion . as
AAa /b can be converted as
AbA
AaA|e
12/13/2012 5:14:19 PM
2.14
We can write LL,S/S

L SL
L,SL|e
Now the grammar taken for predictive parsing is
S(L)/a
LSL
L,SL|e
Now we will compute FIRST and FOLLOW of non terminals
FIRST(S) = {( , a}
FIRST(L) = {( , a}
FIRST(L) = {, , a}
FOLLOW(S) = {, , ) , $}
FOLLOW(L)={)}
FOLLOW(L)={)}
The predictive parsing table can be constructed as
a
Sa
S(L)
LSL
LSL
As we have constructed a predictive parsing table in the string

(a,a)
State
$S
$)L(
$)L
$)LS
$)La
$)L
$ )LS,
$ )LS
$ )La
$ )L
$)
$
Input
(a,a)$
(a,a)$
a,a)$
a,a)$
a,a)$
,a)$
,a)$
a)$
a)$
)$
)$
$
Action
S(L)
LSL
Sa
L,SL
Sa
L
Accept
12/13/2012 5:14:20 PM
2.15
(ii) Conicts in shift-reduce parsing:

There are two conicts that occur in shift shift-reduce parsing:
1. Shift-reduce conict: The parser cannot decide whether to
shift or to reduce.
2. Reduce-reduce conict: The parser cannot decide which of
several reductions to make.
1. Shift-reduce conict:
Example:
Consider the grammar:
EE+E | E*E | id and input id+id*id
Stack
Input
Action
Stack
Input
Action
$ E+E
*id $
Reduce by
EE+E
$ E+E
*id $
Shift
$E
*id $
Shift
$ E+E*
id $
Shift
$ E*
id $
Shift
$ E+E*id
$ E*id
$ E+E*E
$ E*E
*$
$ E+E
Reduce by
Eid
Reduce by
EE*E
$E
Reduce by
Eid
Reduce by
EE*E
Reduce by
EE*E
$E
2. Reduce-reduce conict:
Consider the grammar:
M R+R | R+c | R
Rc
and input c+c
Stack
Input
Action
Stack
Input
Action
c+c $
Shift
c+c $
Shift
$c
+c $
Reduce by
Rc
$c
+c $
Reduce by
Rc
$R
+c $
Shift
$R
+c $
Shift
$ R+
c$
Shift
$R+
c$
Shift
$ R+c
Reduce by
Rc
$R+c
Reduce by
MR+c
(Continued)
12/13/2012 5:14:20 PM
2.16
$ R+R
$M
Continued
i d
Reduce by
$M
MR+R
12. (b) (i) The type checker is a translation scheme that synthesizes the
type of each expression from the types of its sub expressions.
Identier must be declared before the identier is used. The type
checker can handle arrays, pointers, statements and functions.
A Simple Language
Consider the following grammar:
PD;E
D D ; D | id : T
T char | integer | array [ num ] of T | T
E literal | num | id | E mod E | E [ E ] | E
Translation scheme:
PD;E
DD;D
D id : T
{ addtype (id.entry , T.type) }
T char
{ T.type : = char }
T integer
{ T.type : = integer }
T T1
{ T.type : = pointer(T1.type) }
T array [ num ] of T1 { T.type : = array ( 1num.val , T1.type) }
In the above language,
There are two basic types: char and integer ;
type_error is used to signal errors;
the prex operator builds a pointer type. Example, integer leads to the type expression pointer ( integer ).
Type checking of expressions
In the following rules, the attribute type for E gives the type expression assigned to the expression generated by E.
1. E literal
{ E.type : = char }
E num
{ E.type : = integer }
Here, constants represented by the tokens literal and num have
type char and integer.
2. E id
{E.type : = lookup ( id.entry ) }
lookup ( e ) is used to fetch the type saved in the symbol table
entry pointed to by e.
12/13/2012 5:14:20 PM
2.17
3. E E1 mod E2 {E.type : = if E1. type = integer and

E2. type = integer then integer
else type_error }
The expression formed by applying the mod operator to two
subexpressions of type integer has type integer; otherwise, its
type is type_error.
4. E E1 [E2]
{E.type : = if E2.type = integer and
E1.type = array (s,t) then t
else type_error }
In an array reference E1 [ E2 ] , the index expression E2 must
have type integer. The result is the element type t obtained from
the type array (s,t) of E1.
5. E E1
{ E.type : = if E1.type = pointer (t) then t
else type_error }
The postx operator yields the object pointed to by its operand. The type of E is the type t of the object pointed to by the
pointer E.
Type checking of statements
Statements do not have values; hence the basic type void can be
assigned to them. If an error is
detected within a statement, then type_error is assigned.
Translation scheme for checking the type of statements:
1. Assignment statement:
S id : = E
{ S.type : = if id.type = E.type then void
else type_error }
2. Conditional statement:
S if E then S1 { S.type : = if E.type = boolean then S1.type
else type_error }
3. While statement:
S while E do S1 { S.type : = if E.type = boolean then S1.type
else type_error }
4. Sequence of statements:
S S1 ; S2
{ S.type : = if S1.type = void and
S1.type = void then void
else type_error }
Type checking of functions
The rule for checking the type of a function application is :
E E1 ( E2)
{ E.type : = if E2.type = s and
E1.type = s t then t
else type_error }
12/13/2012 5:14:20 PM
2.18
(b) (ii) Storage Organization:

The executing target program runs in its own logical address
space in which each program value has a location.
The management and organization of this logical address space
is shared between the complier, operating system and target machine. The operating system maps the logical address into physical addresses, which are usually spread throughout memory.
Typical subdivision of run-time memory:
Code
Static Data
Stack
free memory
Heap
Run-time storage comes in blocks, where a byte is the smallest
unit of addressable memory. Our bytes form a machine word.
Multibyte objects are stored in consecutive bytes and given the
address of rst byte.
The storage layout for data objects is strongly inuenced by
the addressing constraints of the target machine.
A character array of length 10 needs only enough bytes to hold
10 characters, a compiler may allocate 12 bytes to get alignment,
leaving 2 bytes unused.
This unused space due to alignment considerations is referred
to as padding.
The size of some program objects may be known at run time
and may be placed in an area called static.
The dynamic areas used to maximize the utilization of space at
run time are stack and heap.
13. (a) (i) The common three-address statements are:
1. Assignment statements of the form x : = y op z, where op is
a binary arithmetic or logical operation.
2. Assignment instructions of the form x : = op y, where op is
a unary operation. Essential unary operations include unary
minus, logical negation, shift operators, and conversion operators that, for example, convert a xed-point number to a
oating-point number.
3. Copy statements of the form x : = y where the value of y is
assigned to x.
12/13/2012 5:14:20 PM
2.19
4. The unconditional jump goto L. The three-address statement with label L is the next to be executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (<, =, >=, etc. ) to x and y,
and executes the statement with label L next if x stands in
relation relop to y. If not, the three-address statement following if x relop y goto L is executed next, as in the usual
sequence.
6. param x and call p, n for procedure calls and return y, where
y representing a returned value is optional. For example,
param x1
param x2
param xn
call p,n
generated as part of a call of the procedure p(x1,x2,xn ).
7. Indexed assignments of the form x : = y[i] and x[i] : = y.
8. Address and pointer assignments of the form x : = &y , x :
= *y, and *x : = y.
(a) (ii) There are two types of name representation
1. Fixed-length name
2. Variable length name
1. Fixed-length name
A xed space for each name is allocated in symbol table .in this
type of storage if name is too small then there is wastage of space
For Example :
Name
Starting index
Length
0
10
10
4
14
2
16
2
Attribute
2. Variable Length Name

The amount of space requited by string is used to store the
names.
The name can be stored with the help of starting index and
length of each name
12/13/2012 5:14:20 PM
2.20
For example:
Name
c a l
s u m
a
b
Attribute
c
0 1 2 3 4 5 6 7 8 9
c a l c u l a t e $
10 11 12 13 14 15 16 17
s u m $ a $ b $
13. (b) (i) Boolean expressions have two primary purposes. They are used
to compute logical values, but more often they are used as conditional expressions in statements that alter the ow of control,
such as if-then-else, or while-do statements.
Boolean expressions are composed of the boolean operators (
and, or, and not ) applied to elements that are boolean variables
or relational expressions. Relational expressions are of the form
E1 relop E2, where E1 and E2 are arithmetic expressions.
Here we consider boolean expressions generated by the following grammar :
E E or E | E and E | not E | ( E ) | id relop id | true | false
Methods of Translating Boolean Expressions:
There are two principal methods of representing the value of a
boolean expression. They are :
i. To encode true and false numerically and to evaluate a boolean expression analogously to an arithmetic expression. Often, 1 is used to denote true and 0 to denote false.
ii. To implement boolean expressions by ow of control, that is,
representing the value of a boolean expression by a position
reached in a program. This method is particularly convenient
in implementing the boolean expressions in ow-of-control
statements, such as the if-then and while-do statements.
Numerical Representation
Here, 1 denotes true and 0 denotes false. Expressions will be
evaluated completely from left to right, in a manner similar to
arithmetic expressions.
For example :
The translation for
a or b and not c
12/13/2012 5:14:20 PM
2.21
is the three-address sequence

t1 : = not c
t2 : = b and t1
t3 : = a or t2
A relational expression such as a < b is equivalent to the conditional statement
if a < b then 1 else 0
which can be translated into the three-address code sequence
(again, we arbitrarily start statement numbers at 100) :
100 : if a < b goto 103
101 : t : = 0
102 : goto 104
103 : t : = 1
104 :
Translation scheme using a numerical representation for
Booleans
E E1 or E2
{ E.place : = newtemp;
emit( E.place : = E1.place or E2.place ) }
E E1 and E2 { E.place : = newtemp;
emit( E.place : = E1.place and E2.place ) }
E not E1
emit( E.place : = not E1.place ) }
E ( E1 )
{ E.place : = E1.place }
E id1 relop id2 { E.place : = newtemp;
emit( if id1.place relop.op id2.place goto
nextstat + 3);
emit( E.place : = 0 );
emit(goto nextstat +2);
emit( E.place : = 1) }
E true
E false
Short-Circuit Code:
We can also translate a boolean expression into three-address
code without generating code for any of the boolean operators
and without having the code necessarily evaluate the entire
expression. This style of evaluation is sometimes called shortcircuit or jumping code. It is possible to evaluate boolean
expressions without generating code for the boolean operators
12/13/2012 5:14:20 PM
2.22
and, or, and not if we represent the value of an expression by a

position in the code sequence.
Translation of a < b or c < d and e < f
100 : if a < b goto 103
107 : t2 : = 1
101 : t1 : = 0
108 : if e < f goto 111
102 : goto 104
109 : t3 : = 0
103 : t1 : = 1
110 : goto 112
104 : if c < d goto
107 111 : t3 : = 1
105 : t2 : = 0
112 : t4 : = t2 and t3
106 : goto 108
113 : t5 : = t1 or t4
Flow-of-Control Statements
We now consider the translation of boolean expressions into
three-address code in the context of if-then, if-then-else, and
while-do statements such as those generated by the following
grammar:
S if E then S1
| if E then S1 else S2
| while E do S1
In each of these productions, E is the Boolean expression to be
translated. In the translation, we assume that a three-address
statement can be symbolically labeled, and that the function
newlabel returns a new symbolic label each time it is called.
E.true is the label to which control ows if E is true, and
E.false is the label to which control ows if E is false.
The semantic rules for translating a ow-of-control statement
S allow control to ow from the translation S.code to the threeaddress instruction immediately following S.code. S.next is a
label that is attached to the rst three-address instruction to be
executed after the code for S.
Code for if-then , if-then-else, and while-do statements
to E.true
r
E.code
E.true
r :
E.code
E.true
r :
to E.true
r
S1.code
to E.false
f
to E.fals
f e
S1.code
goto S.next
e
E.fals
f e:
E.fals
f e:
S2.code
S.next
e t:
(a) if-then
(b) if-then-else
12/13/2012 5:14:20 PM
2.23
to E.true
r
E.code
S.begin
i :
E.true
r :
to E.fals
f e
S1.code
goto S.next
e
E.false
f
:
(c) while-do
Syntax-directed denition for ow-of-control statements

PRODUCTION
SEMANTIC RULES
S if E then S1
E.true : = newlabel;
E.false : = S.next;
S1.next : = S.next;
S.code : = E.code || gen(E.true :) || S1.code
S if E then S1 else S2
E.true : = newlabel;
E.false : = newlabel;
S1.next : = S.next;
S2.next : = S.next;
S.code : = E.code || gen(E.true :) || S1.code ||
gen(goto S.next) ||
gen( E.false :) || S2.code
S while E do S1
S.begin : = newlabel;
E-True: = newlabel
E.false : = S.next;
S1.next : = S.begin;
S.code : = gen(S.begin :)|| E.code ||
gen(E.true :) || S1.code ||
gen(goto S.begin)
Control-Flow Translation of Boolean Expressions:

Syntax-directed denition to produce three-address code for
Booleans
PRODUCTION
SEMANTIC RULES
E E1 or E2
E1.true : = E.true;
E1.false : = newlabel;
E2.true : = E.true;
(Continued)
12/13/2012 5:14:21 PM
2.24
Continued
E2.false : = E.false;
E.code : = E1.code || gen(E1.false :) || E2.code
E E1 and E2
E.true := newlabel;
E2.true : = E.true;
E.code : = E1.code || gen(E1.true :) || E2.code
E not E1
E1.true : = E.false;
E1.false : = E.true;
E.code : = E1.code
E ( E1 )
E1.true : = E.true;
E.code : = E1.code
E id1 relop id2
E.code : = gen(if id1.place relop.op id2.place

goto E.true) || gen(goto E.false)
E true
E.code : = gen(goto E.true)
E false
E.code : = gen(goto E.false)
(ii) The procedure is such an important and frequently used programming construct that it is imperative for a compiler to generate good code for procedure calls and returns. The run-time routines that handle procedure argument passing, calls and returns
are part of the run-time support package.
Let us consider a grammar for a simple procedure call statement
1. S call id ( Elist )
2. Elist Elist , E
3. Elist E
Calling Sequences:
The translation for a call includes a calling sequence, a sequence
of actions taken on entry to and exit from each procedure. The
falling are the actions that take place in a calling sequence :
1. When a procedure call occurs, space must be allocated for the
activation record of the called procedure.
12/13/2012 5:14:21 PM
2.25
2. The arguments of the called procedure must be evaluated and

made available to the called procedure in a known place.
3. Environment pointers must be established to enable the called
procedure to access data in enclosing blocks.
4. The state of the calling procedure must be saved so it can
resume execution after the call. Also saved in a known place
is the return address, the location to which the called routine
must transfer after it is nished.
5. Finally a jump to the beginning of the code for the called
procedure must be generated.
6. For example, consider the following syntax-directed translation
(1) S call id ( Elist )
{ for each item p on queue do
emit ( param p );
emit (call id.place) }
(2) Elist Elist , E
{ append E.place to the end of queue }
(3) Elist E
{ initialize queue to contain only E.place
Here, the code for S is the code for Elist, which evaluates the
arguments, followed by a param p statement for each argument,
followed by a call statement queue is emptied and then gets a
single pointer to the symbol table location for the name that denotes the value of E.
14. (a) (i) Issues in the design of code generator
The following issues arise during the code generation phase:
1. Input to code generator
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order
1. Input to code generator:
The input to the code generation consists of the intermediate
representation of the source program produced by front end, together with information in the symbol table to determine runtime addresses of the data objects denoted by the names in the
intermediate representation.
Intermediate representation can be:
a. Linear representation such as postx notation
b. Three address representation such as quadruples
12/13/2012 5:14:21 PM
2.26
c. Virtual machine representation such as stack machine code

d. Graphical representations such as syntax trees and dags.
Prior to code generation, the front end must be scanned, parsed
and translated into intermediate representation along with necessary type checking. Therefore, input to code generation is assumed to be error-free.
2. Target program:
The output of the code generator is the target program. The output may be:
a. Absolute machine language
It can be placed in a xed memory location and can be
executed immediately.
b. Relocatable machine language
It allows subprograms to be compiled separately.
c. Assembly language
Code generation is made easier.
3. Memory management:
Names in the source program are mapped to addresses of data
objects in run-time memory by the front end and code generator.
It makes use of symbol table, that is, a name in a three-address
statement refers to a symbol-table entry for the name.
Labels in three-address statements have to be converted to addresses of instructions For example,
j : goto i generates jump instruction as follows :
if i < j, a backward jump instruction with target address equal
to location of
code for quadruple i is generated.
if i > j, the jump is forward. We must store on a list for quadruple i the location of the rst machine instruction generated
for quadruple j. When i is processed, the machine locations for
all instructions that forward jumps to I are lled.
4. Instruction selection:
The instructions of target machine should be complete and uniform. Instruction speeds and machine idioms are important factors when efciency of target program is considered. The quality of the generated code is determined by its speed and size.
For example
X: = y+z
A: = +t
12/13/2012 5:14:21 PM
2.27
The code for the above statements can be generated as follows;

MOV y,R0
ADD Z,R0
MOV R0,x
MOV x,R0
ADD t,R0
MOV R0,a
Instructions involving register operands are shorter and faster
than those involving operands in memory.
The use of registers is subdivided into two sub problems:
Register allocation the set of variables that will reside in registers at a point in the program is selected.
Register assignment the specic register that a variable will
reside in is picked.
For example , consider the division instruction of the form :
D x, y
where, x dividend even register in even/odd register pair y
divisor
even register holds the remainder
odd register holds the quotient
6. Evaluation order
The order in which the computations are performed can affect
the efciency of the target code. Some computation orders require fewer registers to hold intermediate results than others
14. (a) (ii) Basic Blocks
A basic block is a sequence of consecutive statements in which
ow of control enters at the beginning and leaves at the end
without any halt or possibility of branching except at the end.
The following sequence of three-address statements forms a basic block:
Example:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
12/13/2012 5:14:21 PM
2.28
Basic Block Construction:

Algorithm: Partition into basic blocks
Input: A sequence of three-address statements
Output: A list of basic blocks with each three-address statement in exactly one block
Method:
1. We rst determine the set of leaders, the rst statements
of basic blocks. The rules we use are of the following:
a. The rst statement is a leader.
b. Any statement that is the target of a conditional or unconditional goto is a leader.
c. statement that immediately follows a goto or conditional
goto statement is a leader.
2. For each leader, its basic block consists of the leader and
all statements up to but not including the next leader or
the end of the program.
(b) (i) A code generator generates target code for a sequence of threeaddress statements and effectively uses registers to store operands of the statements.
For example: consider the three-address statement a : = b+c
It can have the following sequence of codes:
ADD Rj, Ri
Cost = 1 // if Ri contains b and Rj contains c
(or)
ADD c, Ri
Cost = 2 // if c is in a memory location
(or)
MOV c, Rj
Cost = 3 // move c from memory to Rj and add
ADD Rj, Ri
Register and Address Descriptors:
A register descriptor is used to keep track of what is currently in
each registers. The register descriptors show that initially all the
registers are empty.
An address descriptor stores the location where the current value
of the name can be found at run time
A code-generation algorithm:
Gen_code(operand1,operand2)
{
If (operand1.addressmode = R)
12/13/2012 5:14:21 PM
2.29
Generate (ADD operand2, RO);

Else if (operator =-)
Generate (SUB operand2, RO);
Else if (operator =*)
Generate (MUL operand2, RO);
Else if (operator =/ )
Generate (DIV operand2, RO);
}
Else if (operand2.addressmode =R)
{
If (operator =+)
Generate (ADD operand 1, RO);
Else if(operator =*)
}
Else
{
Generate (MOV operand2, RO);
If (operator =+)
}
The algorithm takes as input a sequence of three-address statements constituting a basic block.
For each three-address statement of the form x : = y op z, perform the following actions:
1. Invoke a function getreg to determine the location L where
the result of the computation y op z should be stored.
2. Consult the address descriptor for y to determine y, the current location of y. Prefer the register for y if the value of y
is currently both in memory and a register. If the value of y
is not already in L, generate the instruction MOV y , L to
place a copy of y in L.
12/13/2012 5:14:21 PM
2.30
3. Generate the instruction OP z , L where z is a current location of z. Prefer a register to a memory location if z is in
both. Update the address descriptor of x to indicate that x is
in location L. If x is in L, update its descriptor and remove x
from all other descriptors.
4. If the current values of y or z have no next uses, are not live
on exit from the block, and are in registers, alter the register
descriptor to indicate that, after execution of x : = y op z ,
those registers will no longer contain y or z.
(ii) The DAG can be constructed in following steps:
Step 1:
*d
Step 2:
+e
Step 3:
+e
* d,b
Step 4:
a
+e
*d,
15. (a) (i) A transformation of a program is called local if it can be performed by looking only at the statements in a basic block; otherwise, it is called global.
Many transformations can be performed at both the local and
global levels. Local transformations are usually performed rst.
12/13/2012 5:14:21 PM
2.31
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a
program without changing the function it computes.
The transformations:
1. Common sub expression elimination,
2. Copy propagation,
3. Dead-code elimination, and
4. Constant folding
are common examples of such function-preserving transformations. The other transformations come up primarily when global
optimizations are performed.
Common Sub expressions elimination:
An occurrence of an expression E is called a common sub-expression if E was previously computed, and the values of variables in E have not changed since the previous computation. We
can avoid recomputing the expression if we can use the previously computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: = 4*i is eliminated as its computation is already in t1. And value of i is not been changed from
denition to use.
Copy Propagation:
Assignments of the form f : = g called copy statements, or copies for short. The idea behind the copy-propagation transformation is to use g for f, whenever possible after the copy statement
f: = g. Copy propagation means use of one variable instead of
another. This may not appear to be an improvement, but as we
shall see it gives us an opportunity to eliminate x.
12/13/2012 5:14:21 PM
2.32
For example:
x = Pi;
A = x*r*r;
The optimization using copy propagation can be done as follows:
A = Pi*r*r;
Here the variable x is eliminated
Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is
dead or useless code, statements that compute values that never
get used. While the programmer is unlikely to introduce any dead
code intentionally, it may appear as the result of previous transformations. An optimization can be done by eliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, if statement is dead code because this condition will
never get satised.
Constant folding:
We can eliminate both the test and printing from the object code.
More generally, deducing at compile time that the value of an
expression is a constant and using the constant instead is known
as constant folding.
One advantage of copy propagation is that it often turns the copy
statement into dead code.
For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.
Loop Optimizations:
We now give a brief introduction to a very important place for
optimizations, namely loops, especially the inner loops where
programs tend to spend the bulk of their time. The running time
of a program may be improved if we decrease the number of
instructions in an inner loop, even if we increase the amount of
code outside that loop.
12/13/2012 5:14:21 PM
2.33
Three techniques are important for loop optimization:

code motion, which moves code outside a loop;
Induction-variable elimination, which we apply to replace
variables from inner loop.
Reduction in strength, which replaces and expensive operation
by a cheaper one, such as a multiplication by an addition.
Code Motion:
An important modication that decreases the amount of code in
a loop is code motion. This transformation takes an expression
that yields the same result independent of the number of times
a loop is executed ( a loop-invariant computation) and places
the expression before the loop. Note that the notion before the
loop assumes the existence of an entry for the loop. For example, evaluation of limit-2 is a loop-invariant computation in
the following while-statement:
while (i < = limit-2) /* statement does not change limit*/
Code motion will result in the equivalent of
t = limit-2;
while (i< = t) /* statement does not change limit or t */
Induction Variables :
A variable x is called an induction variable of loop L if the value
of variable gets changed every time. It is either decremented or
incremented by some constant
For example:
B1
i: = i+1
t1: = 4*j
t2: = a[t1]
if t2 <10 goto B1
in above code the values of i and t1 are in locked state, that is,
when value of i gets incremented by 1 then t1 gets incremented
by 4. Hence i and t4 are induction variables when there are two
or more induction variables in loop, it may be possible to get rid
of all but one.
Reduction In Strength:
Reduction in strength replaces expensive operations by
equivalent cheaper ones on the target machine. Certain machine
instructions are considerably cheaper than others and can often
be used as special cases of more expensive operators.
12/13/2012 5:14:22 PM
2.34
For example, x is invariably cheaper to implement as x*x

than as a call to an exponentiation routine.
(ii) A statement-by-statement code-generations strategy often
produce target code that contains redundant instructions and
suboptimal constructs .The quality of such target code can be
improved by applying optimizing transformations to the target program.
A simple but effective technique for improving the target code
is peephole optimization, a method for trying to improving the
performance of the target program by examining a short sequence
of target instructions (called the peephole) and replacing these instructions by a shorter or faster sequence, whenever possible.
The following examples of program transformations those are
characteristic of peephole optimizations:
Redundant-instructions elimination
Flow-of-control optimizations
Algebraic simplications
Use of machine idioms
Unreachable Code
If we see the instructions sequence
MOV R0,a
MOV a,R0
we can eliminate the second instruction since x is already in R0.
Unreachable Code:
We can eliminate the unreachable instructions for example
Sum = 0
If(sum)
Print (%d,sum);
Now this if statement will never get executed hence we can
eliminate such a unreachable code
Flows-Of-Control Optimizations:
The unnecessary jumps on jumps can be eliminated in either the
intermediate code or the target code by the following types of
peephole optimizations. We can replace the jump sequence
goto L1
.
L1: gotoL2
by the sequence
12/13/2012 5:14:22 PM
2.35
goto L2
.
L1: goto L2
Algebraic Simplication:
There is no end to the amount of algebraic simplication that
can be attempted through peephole optimization. Only a few
algebraic identities occur frequently enough that it is worth considering implementing them. For example, statements such as
x : = x+0
Or
x:=x*1
Reduction in Strength:
Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper than others and can often be
used as special cases of more expensive operators.
than as a call to an exponentiation
routine.X2 X*X
Use of Machine Idioms:
The target machine may have hardware instructions to implement certain specic operations efciently. For example, some
machines have auto-increment and auto-decrement addressing
modes. These add or subtract one from an operand before or
after using its value.
The use of these modes greatly improves the quality of code
when pushing or popping a stack, as in parameter passing. These
modes can also be used in code for statements like
i : = i+1.
i: = i+1 i++
i: = i-1n i- (b) (i) Flow graphs for control ow constructs such as do-while statements have a useful property: there is a single beginning point
at which control enters and a single end point that control leaves
from when execution of the statement is over. We exploit this
property when we talk of the denitions reaching the beginning
and the end of statements with the following syntax.
S id: = E| S; S | if E then S else S | do S while E
E id + id| id
12/13/2012 5:14:22 PM
2.36
Expressions in this language are similar to those in the intermediate code, but the ow graphs for statements have restricted
forms.
S1
S1
S2
If E goto S1
If E goto S1
S1
S1;S2
S2
If E then S1 else S2
do S1 while E
We dene a portion of a ow graph called a region to be a set of

nodes N that includes a header, which dominates all other nodes
in the region. All edges between nodes in N are in the region,
except for some that enter the header.
The portion of ow graph corresponding to a statement S is a
region that obeys the further restriction that control can ow to
just one outside block when it leaves the region.
We say that the beginning points of the dummy blocks at the
entry and exit of a statements region are the beginning and
end points, respectively, of the statement. The equations are
inductive, or syntax-directed, denition of the sets in[S], out[S],
gen[S], and kill[S] for all statements S.
gen[S] is the set of denitions generated by S while kill[S] is
the set of denitions that never reach the end of S.
(ii) Code improving transformations:
Algorithms for performing the code improving transformations
rely on data-ow information. Here we consider common
sub-expression elimination, copy propagation and for eliminating induction variables.
Global transformations are not substitute for local transformations; both must be performed.
Elimination of global common sub expressions:
The available expressions allow us to determine if an expression
at point p in a ow graph is a common sub-expression. The following algorithm we can eliminating common subexpressions.
12/13/2012 5:14:22 PM
2.37
Algorithm: Global common sub expression elimination.

Input: A ow graph with available expression information.
Output: A ow graph after eliminating common subexpression.
Method: For every statement s of the form x : = y+z such that
y+z is available at the beginning of block and neither y nor r z is
dened prior to statement s in that block, do the following.
1. To discover the evaluations of y+z that reach in the block
containing statement s
2. Create new variable u.
3. Replace each statement w: =y+z found in (1) by
u:=y+z
w:=u
4. Replace statement s by x: = u.
Let us apply this algorithm and perform global common subexpression elimination
Example:
Step 1
t1:= 4*k
t2:= a[t1]
Step
S
2 and 3
m:= 4*k
t1:= m
t2:= a[t1]
t5:= 4*k
t6:= a[t1]
t5:= m
t6:= a[t5]
(12)
(15)
Step 4
now if we assign value to
common subexpression then,
(12):= 4*k
(15):= a[(12)]
(12) can be assigned to t5

t5:= (12)
t6:= (15)
Copy propagation:
The assignment in the form a=b is called copy statement. The
idea behind the copy propagation transformation is to use b for
a whenever possible after copy statement a:=b .
Algorithm: Copy propagation.
Input: a ow graph G, with ud-chains giving the denitions
reaching block B
Output: A graph after Appling copy propagation transformation.
Method: For each copy s: x: =y do the following:
1. Determine those uses of x that are reached by this denition
of namely, s: x: = y.
2. Determine whether for every use of x found in (1) , s is
in c_in[B], where B is the block of this particular use, and
moreover, no denitions of x or y occur prior to this use of x
within B. Recall that if s is in c_in[B]then s is the only denition of x that reaches B.
3. If s meets the conditions of (2), then remove s and replace all
uses of x found in (1) by y.
12/13/2012 5:14:22 PM
2.38
Step 1 and 2
x:= t3
this is a copy
y statement
a[t1]:= t2
a[t4]:= x
use
y: = x + 3 use
a[t5]: = y
since value of t3 and x is not altered along the path from is denition we will replace x by t3 and then eliminate the copy statement.
x:= t3
a[t1]:
[ = t2
a[t4]:= t3 Eliminating
y:= t3 + 3 copy statement
a[t5]: = y
a[t1]:= t2
a[t4]:= t3
y:= t3 + 3
a[t5]: = y
Elimination of induction variable:

A variable x is called an induction variable of a loop L if every
time the variable x changes values, it is incremented or decremented by some constant.
For example
i is an induction variable, for a for loop for i:=1 to 10 while
eliminating induction variables rst of all we have to identify
all the induction variables generally induction variables come in
following forms
a:=i*b
a:=b*i
a:=ib
a:=bi
where b is a constant and i is an induction variables ,basic or
otherwise.
If b is a basic then a is in the family of j the depends on the
denition of i.
Algorithm: Elimination of induction variables
Input: A loop L with reaching denition information, loop-invariant computation information and live variable information.
Output: A ow graph without induction variables
Method: Consider each basic induction variable i whose only
uses are to compute other induction variables in its family and
in conditional branches. Take some j in is family, preferably
one such that c and d in its triple are as simple as possible and
12/13/2012 5:14:22 PM
2.39
modify each test that i appears in to use j instead. We assume in

the following tat c is positive. A test of the form if i relop x goto
B, where x is not an induction variable, is replaced by
r := c*x /* r := x if c is 1. */
r := r+d /* omit if d is 0 */
if j relop r goto B
where, r is a new temporary.
The case if x relop i goto B is handled analogously. If there are
two induction variables i1 and i2 in the test if i1 relop i2 goto B,
then we check if both i1 and i2 can be replaced. The easy case
is when we have j1 with triple and j2 with triple, and c1=c2 and
d1=d2. Then, i1 relop i2 is equivalent to j1 relop j2.
Now, consider each induction variable j for which a statement j: =s was introduced. First check that there can be no assignment to s between the introduced statement j :=s and any
use of j. In the usual situation, j is used in the block in which it
is dened, simplifying this check; otherwise, reaching denitions information, plus some graph analysis is needed to implement theheck. Then replace all uses of j by uses of s and delete
statement j: =s.
i=m1
j=n
t1 = 4*n
v = a[t1]
t4 = 4*j
B1
i=i+1
t2 = 4*i
t3 = a[t2]
if t3<v goto B2
B2
i=j1
t4 = t4 4
t5 = a[t4]
if t5>v goto B3
B3
if i> = j goto B6
x = t3
a [t2] = t5
a [t4] = x
goto B2
B5
B4
x = t3
t14 = a[t1]
a [t2] = t14
a[t1] = x
B6
Fig: Strength reduction applied to 4*j in block B3
12/13/2012 5:14:23 PM
2.40
i=m1
j=n
t1 = 4*n
v = a[t1]
t2 = 4*i
t4 = 4*j
a [t7] = t5
a [t10] = t3
goto B2
B1
t2 = t2 + 4
t3 = a[t2]
if t3<v goto B2
B2
t4 = t4 4
t5 = a[t4]
if t5>v goto B3
B3
if t2>t4 goto B6
B4
B5
t14 = a[t1]
a [t2] = t14
a[t1] = t3
B6
Fig: Flow graph after induction-variable elimination
12/13/2012 5:14:23 PM
B.E. / B.Tech. DEGREE EXAMINATION,

NOV/DEC 2011
Sixth Semester
CS 2352 PRINCIPLES OF COMPILER
DESIGN
(Regulation 2008)
Time : Three hours
Maximum : 100 marks
Answer All questions.

1. What is the role of lexical analyzer?
2. Give the transition diagram for an identier.
3. Dene handle pruning.
4. Mention the two rules for type checking.
5. Construct the syntax tree for the following assignment statement:
a:=b*- c+b*- c.
6. What are the types of three address statements?
7. Dene basic blocks and ow graphs.
8. What is DAG?
9. List out the criterias for code improving transformations.
10. When does dangling reference occur?
12/13/2012 5:14:23 PM
2.42
PART B (5 16 = 80 marks)
11. (a) (i) Describe the various phases of compiler and trace it with the
(10)
program segment (position: = initial + rate * 60).
(ii) State the complier construction tools. Explain them.
(6)
Or
(b) (i) Explain briey about input buffering in reading the source program for nding the tokens.
(8)
(ii) Construct the minimized DFA for the regular expression
(0+1)*(0+1) 10.
(8)
12. (a) Construct a canonical parsing table for the grammar given below.
Also explain the algorithm used.
(16)
EE+T
F (E )
ET
F id .
TT*F
TF
Or
(b) What are the different storage allocation strategies? Explain.
(16)
13. (a) (i) Write down the translation scheme to generate code for assignment statement. Use the scheme for generating three address
code for the assignment statement g: = a+b- c*d.
(8)
(ii) Describe the various methods of implementing three-address
statements.
(8)
Or
(b) (i) How can Back patching be used to generate code for Boolean
(10)
expressions and ow of control statements?
(ii) Write a short note on procedures calls.
14. (a) (i) Discuss the issues in the design of code generator.
(6)
(10)
12/13/2012 5:14:23 PM
Principles of Compiler Design (Nov/Dec 2011)
2.43
(ii) Explain the structure-preserving transformations for basic

blocks.
(6)
Or
(b) (i) Explain in detail about the simple code generator.
(ii) Discuss briey about the Peephole optimization.
15. (a) Describe in detail the principal sources of optimization.
(8)
(8)
(16)
Or
(b) (i) Explain in detail optimization of basic blocks with example. (8)
(ii) Write about Data ow analysis of structural programs.
(8)
12/13/2012 5:14:23 PM
Solutions
PART A
1.
tokens
parser
+
semantic analyzer
lexical
analyzer
source
program
syntax
tree
symbol
table
manager
INSERT FIGURE
Main Task: Take a token sequence from the scanner and verify that it is
a syntactically correct program.
Secondary Tasks:
Process declarations and set up symbol table information accordingly,
in preparation for semantic analysis.
Construct a syntax tree in preparation for intermediate code generation.
2.
Letter or digit
Start
1
Letter
other
return(get token(), install_id())
3. In bottom up parsing the process of detecting handle and using them in

reduction is called handle pruning.
Example:Consider the grammar,
E->E+E
E->id
Now consider the string id+id+id and the rightmost derivation is
E=>E+E
E=>E+E+E
E=>E+E+id
E=>E+id+id
E=>id+id+id
The bold strings are called handles.
12/13/2012 5:14:23 PM
2.45
Right sentinel form handle Production

id+id+id
id
E->id
E+id+id
id
E->id
E+E+id
Id
E->id
E+E+E
E+E
E->E+E
E+E
E+E
E->E+E
E
4. A type checker veries that the type of a construct matches that expected
by its context. For example : arithmetic operator mod in Pascal requires
integer operands, so a type checker veries that the operands of mod have
type integer.
Type information gathered by a type checker may be needed when code
is generated.
5.
:=
+
b
c
6. 1. Assignment statements of the form x: = y op z,

2. Assignment instructions of the form x:= op y,
3. Copy statements of the form x: = y
4. The unconditional jump goto L.
5. Conditional jumps such as if x relop y goto L.
7. A basic block is a sequence of consecutive statements in which ow of
control enters at the beginning and leaves at the end without any halt or
possibility of branching except at the end.
The following sequence of three-address statements forms a basic
block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
12/13/2012 5:14:24 PM
2.46
Flow Graphs
Flow graph is a directed graph containing the ow-of-control information for the set of basic blocks making up a program.
The nodes of the ow graph are basic blocks. It has a distinguished
initial node.
8. A DAG for a basic block is a directed acyclic graph with the following
labels on nodes:
1. Leaves are labeled by unique identiers, either variable names or
constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identiers for labels to
store the computed values.
DAGs are useful data structures for implementing transformations on
basic blocks.
9. Algorithms for performing the code improving transformations rely on
data-ow information. Here we consider common sub-expression elimination, copy propagation and transformations for moving loop invariant
computations out of loops and for eliminating induction variables.
10. Whenever storage can be deallocated, the problem of dangling references arises. A dangling reference occurs when there is a reference to
storage that has been deallocated. It is a logical error to use dangling
references, since the value of deallocated storage is undened according
to the semantics of most languages. Worse, since that storage may later
be allocated to another datum, mysterious bugs can appear in programs
with dangling references.
PART B
11. (a) (i) Phases of Compiler
A Compiler operates in phases, each of which transforms the
source program from one representation into another. The following are the phases of the compiler:
Main phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Code optimization
6. Code generation
12/13/2012 5:14:24 PM
2.47
Sub-Phases:
1. Symbol table management
2. Error handling
Lexical Analysis:
It is the rst phase of the compiler. The lexical analysis is called
scanning. It is the phase of compilation in which the complete
source code is scanned and broken up into group of stings called
tokens.
It reads the characters one by one, starting from left to right
and forms the tokens. Token represents a logically cohesive
sequence of characters such as keywords, operators, identiers,
special symbols etc.
Example: position: =initial + rate*60
1. The identier position
3. The identier initial
4. The plus sign
5. The identier rate
6. The multiplication sign
Syntax Analysis:
Syntax analysis is the second phase of the compiler. It is also
known as parser. It gets the token stream as input from the lexical analyzer of the compiler and generates syntax tree as the
output.
Syntax tree:
It is a tree in which interior nodes are operators and exterior
nodes are operands.
Example: For position: =initial + rate*60, syntax tree is
=
Position +
initial
rate
60
Semantic Analysis:
Semantic Analysis is the third phase of the compiler. It gets
input from the syntax analysis as parse tree and checks whether
the given syntax is correct or not.
12/13/2012 5:14:24 PM
2.48
types.
=
Position +
initial
rate
(int to float)
60

Intermediate code generation gets input from the semantic analysis and converts the input into output as intermediate code such
as three-address code.
This code is in variety of forms as quadruple, triple and indirect triple .The three-address code consists of a sequence of
instructions, each of which has almost three operands.
Example
t1:=int to oat(60)
t2:= rate *t1
t3:=initial+t2
position:=t3
Code Optimization:
Code Optimization gets the intermediate code as input and produces optimized intermediate code as output. This phase reduces
the redundant code and attempts to improve the intermediate
code so that faster-running machine code will result.
affected.
loop unrolling.
Example:
t1:= rate *60
position:=initial+t1
12/13/2012 5:14:25 PM
2.49
Code Generaton:
Code Generation gets input from code optimization phase and
produces the target code or object code as result.
Intermediate instructions are translated into a sequence of
machine instructions that perform the same task.
Machine instructions:
MOV rate, R1
MUL #60, R1
MOV initial, R2
ADD R2, R1
MOV R1, position
Symbol Table Management:
Symbol table is used to store all the information about identiers
used in the program.
It is a data structure containing a record for each identier,
with elds for the attributes of the identier.
It allows to nd the record for each identier quickly and to
store or retrieve data from that record.
Whenever an identier is detected in any of the phases, it is
stored in the symbol table.
Error Handling:
Each phase can encounter errors. After detecting an error, a
phase must handle the error so that compilation can proceed.
tree.
constructs with right syntactic structure but no meaning and
during type conversion.
In code generation, it shows error when code is missing etc.
12/13/2012 5:14:25 PM
2.50

Source Program
Lexical Analyzer
Syntax Analyzer
Symbol Table
Management
Semantic Analyzer
Intermediate Code Generator
Error Detection
and Handling
Code Optimization
Code Generation
Object Program
Fig : Phases of compiler

1. Scanner Generator
2. Parser Generators
3. Syntax-Directed Translation
4. Automatic Code Generators
5. Data-Flow Engines
1. Scanner Generator:
The basic organization of lexical analyzers is based on nite automation.
2. Parser Generators:
These produce syntax analyzers, normally from input that
is based on a context-free grammar.
It consumes a large fraction of the running time of a
compiler.
12/13/2012 5:14:25 PM
2.51
3. Syntax-Directed Translation:
4. Automatic Code Generators:
for data.
5. Data-Flow Engines:
the gathering of information about
(b) (i) As characters are read from left to right, each character is stored
in the buffer to form a meaningful token as shown below:
Forward pointer
A
Beginning of the token
Look ahead pointer
We introduce a two-buffer scheme that handles large look aheads

safely. We then consider an improvement involving sentinels
that saves time checking for the ends of buffers.
Buffer Pairs
A buffer is divided into two N-character halves, as shown below
::E::=::M:*
C : * : : * : 2 : eof
Forward
Lexeme beginning
Each buffer is of the same size N, and N is usually the number of

characters on one disk block. E.g., 1024 or 4096 bytes.
Using one system read command we can read N characters
into a buffer.
If fewer than N characters remain in the input le, then a
special character, represented by eof,
f marks the end of the
source le.
12/13/2012 5:14:25 PM
2.52
Two pointers to the input are maintained:

1. Pointer lexeme beginning marks the beginning of the current
lexeme, whose extent we are attempting to determine.
2. Pointer forward scans ahead until a pattern match is found.
Once the next lexeme is determined, forward is set to the
character at its right end.
The string of characters between the two pointers is the current
lexeme. After the lexeme is recorded as an attribute value of a
token returned to the parser, lexeme beginning is set to the character immediately after the lexeme just found.
Advancing forward pointer:
Advancing forward pointer requires that we rst test whether we
have reached the end of one of the buffers, and if so, we must
reload the other buffer from the input, and move forward to the
beginning of the newly loaded buffer. If the end of second buffer
is reached, we must again reload the rst buffer with input and
the pointer wraps to the beginning of the buffer.
Code to advance forward pointer:
if forward at end of rst half then begin
reload second half;
forward := forward + 1
end
else if forward at end of second half then begin
reload second half;
move forward to beginning of rst half
end
else forward := forward + 1;
Sentinels
For each character read, we make two tests: one for the end
of the buffer, and one to determine what character is read. We
can combine the buffer-end test with the test for the current
character if we extend each buffer to hold a sentinel character
at the end.
The sentinel is a special character that cannot be part of the
source program, and a natural choice is the character eof.
The sentinel arrangement is as shown below:
: : E : : = : : M : * : eof
C : * : : * : 2 : eof : : : eof
Forward
Lexeme beginning
12/13/2012 5:14:26 PM
2.53
Code to advance forward pointer:

forward : = forward + 1;
if forward = eof then begin
if forward at end of rst half then begin
reload second half;
forward := forward + 1
end
else if forward at end of second half then begin
reload rst half;
move forward to beginning of rst half
end
else /* eof within a buffer signifying end of input */
terminate lexical analysis
end
(ii) Construct the minimized DFA for the regular expression
(0 + 1)*(0 + 1)10
(0+1)*(0+1)
10
(0+1)*
(0+1)
10
(0+1)
0,1
0,1
0
q0
0
1
q1 1
q2
q3
Transition diagram for above diagram

States
q0
{q0,q1}
{q0,q1}
q1
(q2)
q2
{q3}
*q3
12/13/2012 5:14:26 PM
2.54
Minimized DFA Table

New State
States
[q0]
[q0,q1]
[q0,q1]
[q1]
[q2]
[q2]
[q3]
*[q3]
[q0,q1]
[q0,q1]
[q0,q1,q2]
[q0,q1,q2]
[q0,q1,q2]
[q0,q1,q2]
*[q0,q1,q3]
[q0,q1]
[q0,q1,q2]
Minimized DFA Diagram:

0
A
0,1
E
1
1
G
12. (a) (i) EE+T

F(E)
ET
Fid
TT*F
TF
Input: An augmented grammar G
Output: The canonical LR parsing table
Algorithm:
1. Initially construct set of items C={I0,I1,I2In} where C is a
collection of set of LR(1) items for the input grammar G
2 The parsing actions are based on each item Ii. The actions are
as given below:
a. If A[ . a , b] is in Ii and goto (Ii, a)= Ij then create an
entry in the action table action[Ii,a]= shift j.
12/13/2012 5:14:26 PM
2.55
b. If there is a production [Aa,

a a] in Ii then in the action table
action [Ii,a]= reduce by Aa.
a Here A should not be S.
c. If there is a production SS., $ in Ii then action[Ii,$]=
accept
3. The goto part of the LR table can be lled as: The goto transitions for state I is considered for non terminals only. If
goto(Ii, A)=Ii then goto[Ii,A]=j
=
4. All the entries not dened by rule 2 and 3 are considered to
be error.
Augmented Grammar
E->.E
1. E->.E+T
2. F->.(E)
3. E->.T
4. F->.id
5. T->.T*F
6. T->.F
FIRST(E)= {(,id }
FOLLOW(E)= { +, ), $}
FIRST(T)= {(,id }
FOLLOW(T)= { +, *, ), $}
FIRST(F)= {(,id }
FOLLOW(F)= {+, *, ), $ }
Initially add E->.E, $ as the rst rule in I0.
On matching E->.E, $ with A->a.
a Xb
Xb, a
A=Ea=
a e X=E b = e a=$
If there is a production X->g,b
g then add X->.gg,b where
bFIRST(b,
b a)
b FIRST(e,
e $) $
Hence the productions become of E becomes
E->.E, $
E->.E+T, $
E->.T, $
I0: E->.E, $
E->.E+T, $
F->.(E), $
E->.T, $
F->.id, $
T->.T*F, $
T->.F, $
I1: goto (I0, E)
E->E. ,$
E->E.+T, $
I4: goto (I0, id)

F->id. , $
I5: goto (I0, F)
T->F. , $
I6: goto (I1, +)
E->E+.T, $
T->.T*F, $
T->.F, $
F->.(E), $
F->.id, $
12/13/2012 5:14:27 PM
2.56
I2: goto (I0, ( )

F-> (.E), $
E->.E+T, $
E->.T, $
T->.T*F, $
T->.F, $
I7: goto (I2, E)

F-> (E.), $
I3: goto (I2, T)
->. , F
->. F, $
I5: goto (I2, F)
->F. , $
I3: goto (I0, T)

E->T. , $
T->T.*F , $
I6: goto (I7, +)

E->E+.T, $
T->.T*F, $
T->.F, $
F->.(E), $
F->.id, $
I8: goto (I3, *)

T->T*.F , $
F->.(E), $
F->.id, $
I11: goto (I8, F)

T->.T*F, $
I9: goto (I6, T)

E->E+.T, $
T->T.*F, $
I8: goto (I9, *)

T->T*.F , $
F->.(E), $
F->.id, $
I5: goto (I6, F)

T->F. , $
I10: go.to (I7,))
F->(E). , $
Parsing table
0
1
2
3
4
5
6
7
8
9
10
11
+
id (
*
S4 S2
S6
r3 r3 r3
r4 r4 r4
r5 r5 r5
r1
r1
S6
S10
r3
r4
r5
r1
E
1
T
3
F
5
5
11
S8
r2 r2 r2
r5 r5 r5
r2
r5
12/13/2012 5:14:27 PM
2.57
(b) The different storage allocation strategies are :

1. Static allocation lays out storage for all data objects at compile
time
2. Stack allocation manages the run-time storage as a stack.
3. Heap allocation allocates and deallocates storage as needed at
run time from a data area known as heap.
Static Allocation:
In static allocation, names are bound to storage as the program is
compiled, so there is no need for a run-time support package.
Since the bindings do not change at run-time, every time a procedure is activated, its names are bound to the same storage locations.
Therefore values of local names are retained across activations of a
procedure. That is, when control returns to a procedure the values of
the locals are the same as they were when control left the last time.
From the type of a name, the compiler decides the amount of storage for the name and at which the target code can nd the data it
operates on.
Stack Allocation:
All compilers for languages that use procedures, functions or methods as units of user dened actions manage at least part of their runtime memory as a stack.
Each time a procedure is called , space for its local variables is
pushed onto a stack, and when the procedure terminates, that space
is popped off the stack.
Calling sequences:
Procedures called are implemented in what is called as calling sequence, which consists of code that allocates an activation record on
the stack and enters information into its elds.
A return sequence is similar to code to restore the state of machine
so the calling procedure can continue its execution after the call.
The code in calling sequence is often divided between the calling
procedure (caller) and the procedure it calls (callee).
When designing calling sequences and the layout of activation records, the following principles are helpful:
Values communicated between caller and callee are generally
placed at the beginning of the callees activation record, so they are
as close as possible to the callers activation record.
Fixed length items are generally placed in the middle. Such items
typically include the control link, the access link, and the machine
status elds.
12/13/2012 5:14:27 PM
2.58
Items whose size may not be known early enough are placed at the
end of the activation record. The most common example is dynamically sized array, where the value of one of the callees parameters
determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common
approach is to have it point to the end of xed-length elds in the
activation record. Fixed-length data can then be accessed by xed
offsets, known to the intermediate-code generator, relative to the
top-of-stack pointer.
Parameters and returned values
callers
activation
record
callers
resp
ponsibility
callees
activation
record
callers
responsibility
control link
links and saved status
temporaries and local data
control link
top_
p sp
s
Fig: Division of tasks between caller and callee

The calling sequence and its division between caller and callee are
as follows.
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top_sp into
the callees activation record. The caller then increments the top_sp
to the respective positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
The callee places the return value next to the parameters.
Using the information in the machine-status eld, the callee restores top_sp and other registers, and then branches to the return
address that the caller placed in the status eld.
Although top_sp has been decremented, the caller knows where
the return value is, relative to the current value of top_sp; the caller
therefore may use that value. Parameters and returned values
12/13/2012 5:14:27 PM
2.59
Variable length data on stack:

The run-time memory management system must deal frequently
with the allocation of space for objects, the sizes of which are not
known at the compile time, but which are local to a procedure and
thus may be allocated on the stack.
The reason to prefer placing objects on the stack is that we avoid
the expense of garbage collecting their space.
The same scheme works for objects of any type if they are local to
the procedure called and have a size that depends on the parameters
of the call.
control link
activation
n
re
ecord for p
pointer to A
pointer to B
pointer to C
array A
array B
arrays of p
array C
activation
ation reco
record for
procedure
ure q call
called by p
control link
arrays of q
top_sp
top
Fig: Access to dynamically allocated arrays

Procedure p has three local arrays, whose sizes cannot be determined at compile time. The storage for these arrays is not part of the
activation record for p.
Access to the data is through two pointers, top and top_sp. Here
the top marks the actual top of stack; it points the position at which
the next activation record will begin.
The second top_sp is used to nd local, xed-length elds of the
top activation record.
The code to reposition top and top_sp can be generated at compile
time, in terms of sizes that will become known at run time.
12/13/2012 5:14:28 PM
2.60
Heap Allocation:
Stack allocation strategy cannot be used if either of the following is
possible :
1. The values of local names must be retained when an activation
ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as needed for activation records or other objects.
Pieces may be deallocated in any order, so over the time the heap
will consist of alternate areas that are free and in use.
The record for an activation of procedure r is retained when the
activation ends.
Therefore, the record for the new activation q(1,9)cannot follow
that for s physically.
If the retained activation record for r is deallocated, there will be
free space in the heap between the activation records for s and q.
Position in the
activation tree
Activation records in the heap
s
s
r
q (1,9)
Remarks
Retained activation
record for r
control link
r
control link
q(1,9)
control link
13. (a) (i) Suppose that the context in which an assignment appears is given by the following grammar.
PM D
M
DD ; D | id : T | proc id ; N D ; S
N
Non terminal P becomes the new start symbol when these productions are added to those in the translation scheme shown below.
Translation scheme to produce three-address code for assignments
12/13/2012 5:14:28 PM
2.61
Sid : = E
{ p : = lookup ( id.name);
iff p nil then
emit( p : = E.place)
else error }
EE1 + E2
emit( E.place : = E1.place + E2.place ) }
EE1
E2
emit( E.place : = E1.place * E2.place ) }
EE1
emit ( E.place : = uminus E1.place )}
E( E1 )
Eid
{id_entry:=lookup(id.name);
if id_entry!= nil then
append (id_entry :=E. place)
else error}
Production Rule
Semantic action
Eid
E.place:=a
Eid
E.place:=b
EE1+E2
E.place:=t1
Eid
E.place:=c
Eid
E.place:=d
EE1*E2
E.place:=t1
t2:=c*d
EE1- E2
E.place:=t3
t 3:=(a+b)- (c*d)
Sid:=E
Output
t1:=a+b
g:=t3
(ii) A three-address statement is an abstract form of intermediate

code. In a compiler, these statements can be implemented as
records with elds for the operator and the operands.
Three such representations are:
Quadruples
Triples
Indirect triples
Quadruples:
A quadruple is a record structure with four elds, which are,
op, arg1, arg2 and result.
12/13/2012 5:14:28 PM
2.62
The op eld contains an internal code for the operator. The

three-address statement x : = y op z is represented by placing
y in arg1, z in arg2 and x in result.
The contents of elds arg1, arg2 and result are normally pointers to the symbol-table entries for the names represented by
these elds. If so, temporary names must be entered into the
symbol table as they are created.
Triples:
To avoid entering temporary names into the symbol table, we
might refer to a temporary value by the position of the statement that computes it.
If we do so, three-address statements can be represented by
records with only three elds: op, arg1 and arg2.
The elds arg1 and arg2, for the arguments of op, are either
pointers to the symbol table or pointers into the triple structure ( for temporary values ).
Since three elds are used, this intermediate code format is
known as triples.
Indirect Triples:
Another implementation of three-address code is that of listing pointers to triples, rather than listing the triples themselves. This implementation is called indirect triples.
Example: a:=b*- c+b*-c
Three address code is
t1:=umius c
t2:=t1*b
t3:=umius c
t4:=t3*b
t5:=t2+t4
a:=t5
Quadruple:
Op
Uminus
*
Uminus
*
+
:=
Arg1
c
b
c
b
t2
t3
Agr2
t1
t3
t4
Result
t1
t2
t3
t4
t5
a
12/13/2012 5:14:28 PM
2.63
Triples
op
Arg 1
Uminus
*
Uminus
*
+
assign
b
c
b
(1)
a
Arg2
(0)
(2)
(3)
(4)
Indirect triple:
Op
Arg 1
(14)
Uminus
(15)
(16)
Uminus
(17)
(16)
(18)
(15)
(17)
(19)
assign
(18)
statement
(0)
(14)
(1)
(15)
(2)
(16)
(3)
(17)
(4)
(18)
(5)
(19)
Arg2
(14)
(b) (i) Backpatching

Backpatching is the activity of lling up unspecied information
of labels using appropriate semantic actions in during the code
generation process
To manipulate lists of labels, we use three functions :
1. makelist(i) creates a new list containing only i, an index into
the array of quadruples; makelist returns a pointer to the list
it has made.
2. merge(p1,p2) concatenates the lists pointed to by p1 and p2,
and returns a pointer to the concatenated list.
3. backpatch(p,i) inserts i as the target label for each of the
statements on the list pointed to by p.
Boolean Expressions:
A translation scheme suitable for producing quadruples for
boolean expressions during bottom-up parsing is constructed.
The grammar use is the following:
12/13/2012 5:14:29 PM
2.64
1. EE1 or M E2
2.
| E1 and M E2
3.
| not E1
4.
| ( E1)
5.
| id1 relop id2
6.
| true
7.
| false
(8) M *
Synthesized attributes truelist and falselist of nonterminal E are
used to generate jumping code for boolean expressions. Incomplete jumps with unlled labels are placed on lists pointed to by
E.truelist and E.falselist.
Consider production E E1 and M E2.
If E1 is false, then E is also false, so the statements on
E1.falselist become part of E.falselist.
If E1 is true, then we must next test E2, so the target for the
statements E1.truelist must be the beginning of the code generated for E2. This target is obtained using marker nonterminal M.
Attribute M.quad records the number of the rst statement of
E2.code. With the production M * we associate the semantic
action
{ M.quad : = nextquad }
The variable nextquad holds the index of the next quadruple to
follow. This value will be backpatched onto the E1.truelist when
we have seen the remainder of the production E E1 and M E2.
The translation scheme is as follows:
1. EE1 or M E2 {backpatch ( E1.falselist,M.quad);
E.truelist : = merge(E1.truelist,
E2.truelist);
E.falselist : = E2.falselist }
2. EE1 and M E2 {backpatch ( E1.truelist, M.quad);
E.truelist : = E2.truelist;
E.falselist : = merge(E1.falselist,
E2.falselist) }
3. E not E1
{ E.truelist : = E1.falselist;
E.falselist : = E1.truelist; }
4. E( E1 )
{E.truelist : = E1.truelist;
E.falselist : = E1.falselist; }
5. Eid1 relop id2 { E.truelist : = makelist (nextquad);
E.falselist : = makelist(nextquad + 1);
emit(if id1.place relop.op id2.place goto_)
emit(goto_) }
12/13/2012 5:14:29 PM
6. E true
7. E false
8. M *
2.65
{ E.truelist : = makelist(nextquad);
emit(goto_)}
{ E.falselist : = makelist(nextquad);
emit(goto_)}
Flow-of-Control Statements:
A translation scheme is developed for statements generated by
the following grammar :
1. S iff E then S
2.
| iff E then S else S
3.
| while E do S
4.
| begin L end
5.
|A
6. L L ; S
7.
|S
Here S denotes a statement, L a statement list, A an assignment statement, and E a Boolean expression. We make the tacit
assumption that the code that follows a given statement in execution also follows it physically in the quadruple array. Else, an
explicit jump must be provided.
Scheme to implement the Translation:
The nonterminal E has two attributes E.truelist and E.falselist. L
and S also need a list of unlled quadruples that must eventually
be completed by backpatching. These lists are pointed to by the
attributes L..nextlist and S.nextlist. S.nextlist is a pointer to a list
of all conditional and unconditional jumps to the quadruple following the statement S in execution order, and L.nextlist
is dened similarly.
The semantic rules for the revised grammar are as follows:
1 Sif E then M1 S1 N else M2 S2
{ backpatch (E.truelist, M1.quad);
backpatch (E.falselist, M2.quad);
S.nextlist : = merge (S1.nextlist, merge (N.nextlist,
S2.nextlist))}
We backpatch the jumps when E is true to the quadruple
M1.quad, which is the beginning of the code for S1. Similarly,
we backpatch jumps when E is false to go to the beginning of the
code for S2. The list S.nextlist includes all jumps out of S1 and
S2, as well as the jump generated by N.
2. N
{N.nextlist : = makelist( nextquad );
emit(goto _)}
12/13/2012 5:14:29 PM
2.66
3. M
{M.quad : = nextquad }
4. Siff E then M S1 {backpatch( E.truelist, M.quad);
S.nextlist : = merge( E.falselist,
S1.nextlist)}
5. Swhile M1 E do M2 S1 { backpatch( S1.nextlist, M1.
quad);
backpatch( E.truelist,
M2.quad);
S.nextlist : = E.falselist
emit( goto M1.quad)}
6. Sbegin L end
{S.nextlist : = L.nextlist }
7. SA
{ S.nextlist : = nil }
The assignment S.nextlist : = nil initializes S.nextlist to an
empty list.
8. LL1 ; M S
{ backpatch( L1.nextlist, M.quad);
L.nextlist : = S.nextlist }
The statement following L1 in order of execution is the beginning of S. Thus the L1.nextlist list is backpatched to the beginning of the code for S, which is given by M.quad.
9. L S
{L.nextlist : = S.nextlist }
(ii) The procedure is such an important and frequently used programming construct that it is imperative for a compiler to generate good code for procedure calls and returns. The run-time
routines that handle procedure argument passing, calls and
returns are part of the run-time support package.
Let us consider a grammar for a simple procedure call statement
1. Scall id (Elist
(
)
2. Elis Elistt , E
3. ElistE
Calling Sequences:
The translation for a call includes a calling sequence, a sequence
of actions taken on entry to and exit from each procedure. The
falling are the actions that take place in a calling sequence :
1. When a procedure call occurs, space must be allocated for the
activation record of the called procedure.
2. The arguments of the called procedure must be evaluated and
made available to the called procedure in a known place.
3. Environment pointers must be established to enable the called
procedure to access data in enclosing blocks.
4. The state of the calling procedure must be saved so it can
resume execution after the call. Also saved in a known place
12/13/2012 5:14:29 PM
2.67
is the return address, the location to which the called routine

must transfer after it is nished.
5. Finally a jump to the beginning of the code for the called
procedure must be generated.
6. For example, consider the following syntax-directed translation
1. Scall id(Elist
(
)
{for each item p on queue do
emit( param p );
emit(call id.place
.
)}
2. ElistElist , E
{ append E.place
.
to the end of queue }
3. ElistE
{initialize queue to contain only E.place
Here, the code for S is the code for Elist, which evaluates the
arguments, followed by a param p statement for each argument,
followed by a call statement. queue is emptied and then gets
a single pointer to the symbol table location for the name that
denotes the value of E.
14. (a) (i) Issues in the design of code generator
The following issues arise during the code generation phase:
2. Target program
6. Evaluation order
1.Input to code generator:
The input to the code generation consists of the intermediate
representation of the source program produced by front end, together with information in the symbol table to determine runtime addresses of the data objects denoted by the names in the
intermediate representation.
Intermediate representation can be:
c. Virtual machine representation such as stack machine code
Prior to code generation, the front end must be scanned, parsed
and translated into intermediate representation along with necessary type checking. Therefore, input to code generation is assumed to be error-free.
12/13/2012 5:14:29 PM
2.68
2. Target program:
The output of the code generator is the target program. The output may be:
Names in the source program are mapped to addresses of data objects in run-time memory by the front end and code generator.
Labels in three-address statements have to be converted to addresses of instructions For example,
if i < j, a backward jump instruction with target address equal
to location of code for quadruple i is generated.
if i > j, the jump is forward. We must store on a list for quadruple i the location of the rst machine instruction generated
for quadruple j. When i is processed, the machine locations for
all instructions that forward jumps to I are lled.
The instructions of target machine should be complete and uniform. Instruction speeds and machine idioms are important factors when efciency of target program is considered. The quality of the generated code is determined by its speed and size.
For example
X:=y+z
A:=x+t
The code for the above statements can be generated as follows:
MOV y,R0
ADD Z,R0
MOV R0,x
MOV x,R0
ADD t,R0
MOV R0,a
12/13/2012 5:14:29 PM
2.69
Instructions involving register operands are shorter and faster
than those involving operands in memory.
The use of registers is subdivided into two sub problems:
Register allocation the set of variables that will reside in registers at a point in the program is selected.
D x, y
where, x dividend even register in even/odd register pair y
divisor
6. Evaluation order
The order in which the computations are performed can affect the efficiency of the target code. Some computation orders require fewer registers to hold intermediate results than
others
(ii) a. Com.mon subexpression elimination:
a:=b+c
a:=b+c
b:=ad
b:=ad
c:=b+c
c:=b+c
d:=ad
d:=b
Since the second and fourth expressions compute the same
expression, the basic block can be transformed as above.
b. Dead-code elimination:
Suppose x is dead, that is, never subsequently used, at the
point where the statement x : = y + z appears in a basic block.
Then this statement may be safely removed without changing the value of the basic block.
c. Renaming temporary variables:
A statement t : = b + c ( t is a temporary ) can be changed
to u : = b + c (u is a new temporary) and all uses of this instance of t can be changed to u without changing the value
of the basic block.
Such a block is called a normal-form block.
12/13/2012 5:14:29 PM
2.70
d. Interchange of statements:
Suppose a block has the following two adjacent statements:
t1 : = b + c
t2 : = x + y
We can interchange the two statements without affecting the
value of the block if and only if neither x nor y is t1 and
neither b nor c is t2.
(b) (i) A code generator generates target code for a sequence of threeaddress statements and effectively uses registers to store operands of the statements.
For example: consider the three-address statement a := b+c
ADD Rj, Ri
Cost = 1 // if Ri contains b and Rj contains c
(or)
ADD c, Ri
(or)
MOV c, Rj
ADD Rj, Ri

An address descriptor stores the location where the current
value of the name can be found at run time
{
Else if (operator =- )
Generate( MUL operand2, RO);
Generate( DIV operand2, RO);
}
12/13/2012 5:14:29 PM
2.71
{
If (operator =+)
}
Else
{
If (operator =+)
}
the result of the computation y opz should be stored.
2. Consult the address descriptor for y to determine y, the current location of y. Prefer the register for y if the value of y is
currently both in memory and a register. If the value of y is
not already in L, generate the instruction MOV y, L to place
a copy of y in L.
3. enerate the instruction OP z, L where z is a current location of z. Prefer a register to a memory location if z is in
12/13/2012 5:14:29 PM
2.72
(ii) A statement-by-statement code-generations strategy often produce target code that contains redundant instructions and suboptimal constructs .The quality of such target code can be improved by applying optimizing transformations to the target
program.
A simple but effective technique for improving the target
code is peephole optimization, a method for trying to improving the performance of the target program by examining a short
sequence of target instructions (called the peephole) and replacing these instructions by a shorter or faster sequence, whenever
possible.
Unreachable Code
MOV R0,a
MOV a,R0
we can eliminate the second instruction since x is already in
R0.
Unreachable Code:
Sum=0
If(sum)
Print (%d,sum);
The unnecessary jumps on jumps can be eliminated in either
the intermediate code or the target code by the following types
of peephole optimizations. We can replace the jump sequence
goto L1
L1: gotoL2
by the sequence
goto L2
12/13/2012 5:14:29 PM
2.73
L1: goto L2
algebraic identities occur frequently enough that it is worth considering implementing them .For example, statements such as
x := x+0
Or
x := x * 1
Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine
routine. X2X*X
i :=i+1.
i:=i+1i++
i:=i- 1i- 15. (a) A transformation of a program is called local if it can be performed
by looking only at the statements in a basic block; otherwise, it is
called global. Many transformations can be performed at both the
local and global levels. Local transformations are usually performed
rst.
There are a number of ways in which a compiler can improve a program without changing the function it computes.
12/13/2012 5:14:29 PM
2.74
4. Constant folding
are common examples of such function-preserving transformations.
The other transformations come up primarily when global optimizations are performed.
An occurrence of an expression E is called a common sub-expression
if E was previously computed, and the values of variables in E have
not changed since the previous computation. We can avoid recomputing the expression if we can use the previously computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression
elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1. And value of i is not been changed from denition to use.
Copy Propagation:
Assignments of the form f : = g called copy statements, or copies
for short. The idea behind the copy-propagation transformation is to
use g for f, whenever possible after the copy statement f: = g. Copy
propagation means use of one variable instead of another. This may
not appear to be an improvement, but as we shall see it gives us an
opportunity to eliminate x.
For example:
x=Pi;
A=x*r*r;
12/13/2012 5:14:30 PM
2.75

A=Pi*r*r;
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is
dead or useless code, statements that compute values that never get
used. While the programmer is unlikely to introduce any dead code
intentionally, it may appear as the result of previous transformations.
An optimization can be done by eliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, if statement is dead code because this condition will never
get satised.
Constant folding:
We can eliminate both the test and printing from the object code. More
generally, deducing at compile time that the value of an expression is a
constant and using th e constant instead is known as constant folding.
One advantage of copy propagation is that it often turns the copy
statement into dead code.
For example,
a=1.570 thereby eliminating a division operation.
Loop Optimizations:
We now give a brief introduction to a very important place for optimizations, namely loops, especially the inner loops where programs
tend to spend the bulk of their time. The running time of a program
may be improved if we decrease the number of instructions in an inner loop, even if we increase the amount of code outside that loop.
Induction-variable elimination, which we apply to replace variables from inner loop.
Reduction in strength, which replaces and expensive operation by
a cheaper one, such as
a multiplication by an addition.
12/13/2012 5:14:30 PM
2.76
Code Motion:
An important modication that decreases the amount of code in a
loop is code motion. This transformation takes an expression that
yields the same result independent of the number of times a loop is
executed ( a loop-invariant computation) and places the expression
before the loop. Note that the notion before the loop assumes the
existence of an entry for the loop. For example, evaluation of limit-2
is a loop-invariant computation in the following while-statement:
while (i <= limit-2) /* statement does not change limit*/
t= limit-2;
while (i<=t) /* statement does not change limit or t */
of variable gets changed every time .it is either decremented or
For example:
B1
i:=i+1
t1:=4*j
t2:=a[t1]
if t2 <10 goto B1
in above code the values of i and t1 are in locked state. that is ,when
value of i gets incremented by 1 then t1 gets incremented by 4 . hence
i and t4 are induction variables when there are two or more induction
variables in loop ,it may be possible to get rid of all but one
Reduction in strength replaces expensive operations by equivalent
cheaper ones on the target machine. Certain machine instructions
are considerably cheaper than others and can often be used as special
cases of more expensive operators.
For example, x is invariably cheaper to implement as x*x than as
a call to an exponentiation routine.
(b) (i) There are two types of basic block optimizations. They are:
Structure-Preserving Transformations
Algebraic Transformations
Structure-Preserving Transformations:
The primary Structue-Preserving Transformation on basic
blocks are:
12/13/2012 5:14:30 PM
2.77
Common sub-expression elimination

Dead code elimination
Renaming of temporary variables
Interchange of two independent adjacent statements.
Common sub-expression elimination:

Common sub expressions need not be computed over and over
again. Instead they can be computed once and kept in store from
where its referenced when encountered again of course providing the variable values in the expression still remain constant.
Example:
a: = b + c
b: = a - d
c: = b + c
d: = a-d
The 2nd and 4th statements compute the same expression: b+c
and a-d
Basic block can be transformed to
a: = b + c
b: = a-d
c: = a
d: = b
Renaming of temporary variables:
A statement t:=b+c where t is a temporary name can be changed
to u:=b+c where u is another temporary name, and change all
uses of t to u.
In this we can transform a basic block to its equivalent block
called normal-form block.
Interchange of two independent adjacent statements:
Two statements
t1:=b+c
t2:=x+y
can be interchanged or reordered in its computation in the basic
block when value of t1 does not affect the value of t2.
Algebraic Transformations:
Algebraic identities represent another important class of optimizations on basic blocks. This includes simplifying expressions
or replacing expensive operation by cheaper ones i.e. reduction
12/13/2012 5:14:30 PM
2.78
in strength. Another class of related optimizations is constant

folding. Here we evaluate constant expressions at compile time
and replace the constant expressions by their values. Thus the
expression 2*3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate unexpected common sub expressions.
Associative laws may also be applied to expose common
sub expressions. For example, if the source code has the
assignments
a :=b+c
e :=c+d+b
the following intermediate code may be generated:
a :=b+c
t :=c+d
e :=t+b
Example:
x:=x+0 can be removed
x:=y**2 can be replaced by a cheaper statement x:=y*y
The compiler writer should examine the language carefully
to determine what rearrangements of computations are permitted, since computer arithmetic does not always obey the
algebraic identities of mathematics. Thus, a compiler may
evaluate x*y- x*z as x*(y- z) but it may not evaluate a+(bc)
as (a+b)c.
(ii) Flow graphs for control ow constructs such as do-while
statements have a useful property: there is a single beginning
point at which control enters and a single end point that control leaves from when execution of the statement is over. We
exploit this property when we talk of the denitions reaching
the beginning and the end of statements with the following
syntax.
Sid: = E| S; S | if E then S else S | do S while E
Eid + id| id
Expressions in this language are similar to those in the
intermediate code, but the ow graphs for statements have
restricted forms.
We dene a portion of a ow graph called a region to be a set
of nodes N that includes a header, which dominates all other
nodes in the region. All edges between nodes in N are in the
region, except for some that enter the header.
12/13/2012 5:14:30 PM
2.79
S1
S1
If E goto S1
If E goto S1
S2
S1
S1;S2
S2
If E then S1 else S2
do S1 while E
The portion of ow graph corresponding to a statement S is a

region that obeys the further restriction that control can ow to
just one outside block when it leaves the region.
We say that the beginning points of the dummy blocks at the
entry and exit of a statements region are the beginning and
end points, respectively, of the statement. The equations are
inductive, or syntax-directed, denition of the sets in[S], out[S],
gen[S], and kill[S] for all statements S.
gen[S] is the set of denitions generated by S while kill[S] is
the set of denitions that never reach the end of S.
12/13/2012 5:14:30 PM
B.E./B.TECH. DEGREE EXAMINATION,

APRIL/MAY 2011
Sixth Semester

CS 2352 PRINCIPLES OF COMPILER DESIGN
(Regulation 2008)
Time: Three hours
Maximum: 100 marks
Answer All Questions
1. What is an interpreter?
2. Dene token and lexeme.
3. What is handle pruning?
4. What are the limitations of static allocation?
5. List out the benets of using machine independent intermediate forms
6. What is a syntax tree? Draw the syntax tree for the following statements
a:=b*- c+b*- c
7. List out the primary structure preserving transformation on basic block.
8. What is the purpose of next-use information?
9. Dene dead-code elimination.
10. What is loop optimization?
PART B (5 16 = 80)
11. (a) (i) Describe the various Phases of compiler and trace the program
segment 4: * + = cba for all phases.
(10)
(ii) Explain in detail about compiler construction tools.
(6)
12/13/2012 5:14:30 PM
Principles of Compiler Design (April/May 2011)
2.81
Or
(b) (i) Discuss the role of lexical analyzer in detail.
(8)
(ii) Draw the transition diagram for relational operators and unsigned
numbers in Pascal.
(8)
12. (a) (i) Explain the error recovery strategies in syntax analysis.
(6)
(ii) Construct a SLR construction table for the following grammar

TEE+
TE
FTT*
() E F
id F
(10)
Or
(b) (i) Distinguish between the source text of a procedure and its
activation at run time.
(8)
(ii) Discuss the various storage allocation strategies in detail.
13. (a) (i) Dene three address code. Describe the various methods of
implementing three-address statements with an example.
(8)
(ii) Give the translation scheme for converting the assignments into
three address code.
(8)
Or
(b) (i) Discuss the various methods for translating Boolean expression.
(8)
(ii) Explain the process of generating the code for a Boolean
expression in a single pass using back patching.
(8)
132 132 132
11267 3
14. (a) (i) Write in detail about the issues in the design of a code generator.
(10)
(ii) Dene basic block. Write an algorithm to partition a sequence of
three-address statements into basic blocks.
(6)
Or
12/13/2012 5:14:30 PM
2.82
(b) (i) How to generate a code for a basic block from its dag representation? Explain.
(6)
(ii) Briey explain about simple code generator.
(10)
15. (a) (i) Write in detail about function-preserving transformations.

(ii) Discuss briey about Peephole Optimization.
(8)
(8)
Or
(b) (i) Write an algorithm to construct the natural loop of a back edge.
(6)
(ii) Explain in detail about code-improving transformations.
(10)
12/13/2012 5:14:30 PM
Solutions
PART A
1. An Interpreter is a translator which produces the result directly when the
source language and data is given to it as input. It does not produce the
object code. The source program gets interpreted every time the source
program is analyzed.
Data
Source program
Result
INTERPRETER
(Direct Execution)
2. Token: Tokens describes the class or category of input string. Identiers,

keywords, constants are called tokens.
Lexemes: Sequence of characters in the source program that are matched
with the pattern of the token.
Ex: int a;
Lexeme
Token
Int
Keyword
Identier
3. In bottom up parsing the process of detecting handle and using them in

reduction is called handle pruning.
Example: Consider the grammar,
E->E+E
E->id
Now consider the string id+id+id and the rightmost derivation is
E=>E+E
E=>E+E+E
E=>E+E+id
E=>E+id+id
E=>id+id+id
The bold strings are called handles.
12/13/2012 5:14:30 PM
2.84
Right sentinel form handle
Production
id+id+id
id
E->id
E+id+id
id
E->id
E+E+id
Id
E->id
E+E+E
E+E
E->E+E
E+E
E+E
E->E+E
E
4. The static allocation can be done only if the size of data object is known
at compile time.
The data structures cannot be created dynamically. In the sense that, the
static allocation cannot manage the allocation of memory at run time.
Recursive procedures are not supported by this type of allocation.
5. A compiler for different machines can be created by attaching different
backend to the existing front ends of each machine.
A compiler for different source languages can be created by proving
different front ends for corresponding source languages to existing
backend.
A machine independent code optimizer can be applied to intermediate
code in order to optimize the code generation.
6. The natural hierarchical structure is represented by syntax trees.
::=
+
*
b
7. The structure preserving transformation is a DAG based transformation. It

means a DAG is constructed for the basic block then the above said transformations can be applied. The structure preserving transformations can
be applied by applying some principle techniques such as common sub
expression elimination, variable and constant propagation, code movement, and dead code elimination.
8. The next-use information is a collection of all the names that are useful
for next subsequent statement in a block. The use of a name is dened as
follows.
12/13/2012 5:14:31 PM
2.85
Consider a statement,
x:=i
j:= x op y
that means the statements j uses value of x.
9. A variable is live at a point in a program if its value can be used subsequently; otherwise it is dead at that point. A related idea is dead or
useless code, statements that compute values that never get used. While
the programmer is unlikely to introduce any dead code intentionally, it
may appear as the result of previous transformations. An optimization
can be done by eliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, if statement is dead code because this condition will never get
satised.
10. The running time of a program may be improved if the number of instructions in an inner loop is decreased, even if the amount of code outside
that loop is increased.
Induction-variable elimination, which we apply to replace variables
from inner loop.
Reduction in strength, which replaces and expensive operation by a
cheaper one, such as a multiplication by an addition.
PART B
11. (a) (i) A Compiler operates in phases, each of which transforms the
source program from one representation into another. The following are the phases of the compiler:
Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
12/13/2012 5:14:31 PM
2.86
Sub-Phases:
1) Symbol table management
2) Error handling
Source Program
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Symbol table
management
Error detection and

handling
Intermediate Code
Generator
Code Optimizer
Code Generator
Target Machine code
Lexical analysis:
It is the rst phase of the compiler. It gets input from the source
program and produces tokens as output.
It reads the characters one by one, starting from left to right
and forms the tokens.
Token: It represents a logically cohesive sequence of characters such as keywords, operators, identiers, special symbols
etc.
Example: source code:a:=b+c*4
then in lexical analysis phase this statement is broken up into
series of token as follows
1. The identier a
3. The identier b
12/13/2012 5:14:31 PM
2.87
4. The plus sign

5. The identier c
6. The multiplication sign
Syntax analysis:
It is the second phase of the compiler. It is also known as
parser.
It gets the token stream as input from the lexical analyzer of
the compiler and generates syntax tree as the output.
Syntax tree:
It isa tree in which interior nodes are operators and exterior
nodes are operands.
Example :a:=b+c*4
:=
+
b
c
Semantic analysis:
It is the third phase of the compiler.
It gets input from the syntax analysis as parse tree and checks
whether the given syntax is correct or not.
types.
Example : a:=b+c*4
:=
+
b
c
int to float
4

It is the fourth phase of the compiler.
It gets input from the semantic analysis and converts the input
into output as intermediate code such as three-address code.
12/13/2012 5:14:31 PM
2.88
The three-address code consists of a sequence of instructions,

each of which has atmost three operands.
For example:
t1:=int to oat(4)
t2:=c*t1
t3:=b+t2
a:=t3
Code optimization:
It is the fth phase of the compiler.
It gets the intermediate code as input and produces optimized
intermediate code as output.
This phase reduces the redundant code and attempts to improve the intermediate code so that faster-running machine
code will result.
affected.
loop unrolling.
For example:
t1:= c*4
a:=b+t1
Code generation:
It is the nal phase of the compiler.
It gets input from code optimization phase and produces the
target code or object code as result.
Intermediate instructions are translated into a sequence of
machine instructions that perform the same task.
M OV c,R0
MUL #4 ,R0
12/13/2012 5:14:32 PM
2.89
MOV b, R1
ADD R1,R0
MOV R0, a
Symbol table management:
Symbol table is used to store all the information about
identiers used in the program.
It is a data structure containing a record for each identier,
with elds for the attributes of the identier.
It allows to nd the record for each identier quickly and to
store or retrieve data from that record.
Whenever an identier is detected in any of the phases, it is
stored in the symbol table.
Error handling:
Each phase can encounter errors. After detecting an error,
a phase must handle the error so that compilation can
proceed.
tree.
constructs with right syntactic structure but no meaning and
during type conversion.
12/13/2012 5:14:32 PM
2.90
a:=b+c*4
intermediate code generator
lexical analyzer
id1:= id2+ id3* 4
temp1 := inttoreal (4)

temp2 := id3* temp1
temp3 := id3+ temp2
syntax analyzer
id1
:= temp3
:=
id1
code optimizer
+
id2
temp1 := id3* 60.0
*
id3
semantic analyzer
id1
:= id2+ temp1
code generator
:=
id1
+
id2
*
id3
inttoreal
MOVF
id3,
R2
MULF
#4.0, R2
MOVF id2,
R1
ADDF
R2,
R1
MOVF
R1,
d1
Scanner Generator
Parser Generators
12/13/2012 5:14:32 PM
2.91
Syntax-Directed Translation
Automatic Code Generators
Data-Flow Engines
1) Scanner Generator:
The basic organization of lexical analyzers is based on nite
automation.
2) Parser Generators:
These produce syntax analyzers, normally from input that
is based on a context-free grammar.
It consumes a large fraction of the running time of a
compiler.
3) Syntax-Directed Translation:
4) Automatic Code Generators:
for data.
5) Data-Flow Engines:
the gathering of information about how values are transmitted from one part of a program to each other part.
(b) (i)
tokens
source
program
lexical
analyzer
parser
+
semantic analyzer
syntax
tree
symbol
table
manager
12/13/2012 5:14:32 PM
2.92
Lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which
performs lexical analysis is called a lexical analyzer or scanner.
A lexer often exists as a single function which is called by a
parser or another function.
The role of the lexical analyzer
The lexical analyzer is the rst phase of a compiler.
Its main task is to read the input characters and produce as
output a sequence of tokens that the parser uses for syntax
analysis.
Upon receiving a get next token command from the parser,
the lexical analyzer reads input characters until it can identify
the next token.
Issues of lexical analyzer
There are three issues in lexical analysis:
To make the design simpler.
To improve the efciency of the compiler.
To enhance the computer portability.
Tokens
A token is a string of characters, categorized according to the
rules as a symbol (e.g., IDENTIFIER, NUMBER, COMMA).
The process of forming tokens from an input stream of characters is called tokenization.
A token can look like anything that is useful for processing an
input text stream or text le. Consider this expression in the C
programming language: sum=3+2;
Lexeme Token type
Sum
Identier
Assignment operator
Number
Addition operator
Number
delimiter
Lexeme:
Collection or group of characters forming tokens is called
Lexeme.
12/13/2012 5:14:33 PM
2.93
Pattern:
A pattern is a description of the form that the lexemes of a token
may take. In the case of a keyword as a token, the pattern is just
the sequence of characters that form the keyword. For identiers
and some other tokens, the pattern is a more complex structure
that is matched by any strings.
Attributes for Tokens
Some tokens have attributes that can be passed back to the parser.
The lexical analyzer collects information about tokens into their
associated attributes. The attributes inuence the translation of
tokens.
i) Constant : value of the constant
ii) Identiers: pointer to the corresponding symbol table entry.
Error recovery strategies in lexical analysis:
The following are the error-recovery actions in lexical analysis:
1) Deleting an extraneous character.
2) Inserting a missing character.
3) Replacing an incorrect character by a correct character.
4) Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from
the token until error is resolved.
(b) (ii) The relational operators are <,>,<=,>=,=,!=
Start
S0
<
S1
=
S2
Return (operator, LE)
!
Other
S9
S3
=
=
S4
S10
other
S11
>
S6
S7
other
S5
Return(operator,EQ)
Return(operator,GE)
other
oth
Return(operator,NE)
a
Return (Operator, LT)
S8
12/13/2012 5:14:33 PM
2.94
12. (a) (i) The different strategies that a parse uses to recover from a syntactic error are:
1. Panic mode
2. Phrase level
3. Error productions
4. Global correction
Panic mode recovery:
On discovering an error, the parser discards input symbols one
at a time until a synchronizing token is found. The synchronizing tokens are usually delimiters, such as semicolon or end. It
has the advantage of simplicity and does not go into an innite
loop. When multiple errors in the same statement are rare, this
method is quite useful.
Phrase level recovery:
On discovering an error, the parser performs local correction on
the remaining input that allows it to continue. Example: Insert a
missing semicolon or delete an extraneous semicolon etc.
Error productions:
The parser is constructed using augmented grammar with error
productions. If an error production is used by the parser, appropriate error diagnostics can be generated to indicate the erroneous constructs recognized by the input.
Global correction:
Given an incorrect input string x and grammar G, certain algorithms can be used to nd a parse tree for a string y, such that the
number of insertions, deletions and changes of tokens is as small
as possible. However, these methods are in general too costly in
terms of time and space.
(ii) The given grammar is :
G : E E + T ------ (1)
E T ---------------- (2)
T T * F ----------- (3)
T F --------------- (4)
F (E) ------------- (5)
F id --------------- (6)
Step 1 : Convert given grammar into augmented grammar.
12/13/2012 5:14:33 PM
2.95
Augmented grammar:
E E
EE+T
ET
TT*F
TF
F (E)
F id
Step 2 : Find LR (0) items.
I0 : E . E
E.E+T
E.T
T.T*F
T.F
F . (E)
F . id
GOTO ( I0 , E)
I1 : E E .
EE.+T
GOTO (I0 , T)
I2 : E T .
TT.*F
GOTO (I0 , F)
I3 : T F .
GOTO (I0 , ( )
I4 : F ( . E)
E.E+T
E.T
T.T*F
T.F
F . (E)
F . id
GOTO (I0 , id )
I5 : F id.
T.F
12/13/2012 5:14:33 PM
2.96
GOTO ( I1 , + )
I6 : E E + . T
T.T*F
T.F
F . (E)
F . id
GOTO ( I2 , * )
I7 : T T * . F
F . (E)
F . id
GOTO ( I4 , E )
I8 : F ( E . )
EE.+T
GOTO ( I6 , T )
I9 : E E + T
TT.*F
GOTO ( I7 , F )
I10 : T T * F .
GOTO ( I8 , ) )
I11 : F ( E ) .
FOLLOW(E) = {$}
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOLLOW (F) = { * , + , ) , $ }
SLR parsing table:
State Action
id
0
GOTO
+
S5
S4
S6
r2
S7
r2
r4
r4
r4
ACCEPT
12/13/2012 5:14:33 PM
S5
S4
r6
r6
2.97
r6
S5
S4
S5
S4
10
S6
r1
S7
r1
10
r3
r3
r3
11
r5
r5
r5
Blank entries are error entries.

(b) (i) Source text of a procedure
main()
{
int f;
f=factorial(2);
}
int factorial(int n)
{
if(n= =1)
return 1;
use
return (n*factorial(n- 1));
}
Its activation at run time.
main
local
Act. Record for main

f
To calling Procedure
Act. Record
For factorial
Return value
Act.parameter
Dynamic link
12/13/2012 5:14:33 PM
2.98
Act. Record for main
main
Local
Factorial
Act. Record for factorial (3)
return value
Parameter
dynamic link
Factorial
return value
Parameter
Act. Record for factorial (2)
dynamic link
(b) (ii) The different storage allocation strategies are :

1. Static allocation lays out storage for all data objects at
compile time
2. Stack allocation manages the run-time storage as a stack.
3. Heap allocation allocates and deallocates storage as needed
at run time from a data area known as heap.
Static Allocation:
In static allocation, names are bound to storage as the program is
compiled, so there is no need for a run-time support package.
Since the bindings do not change at run-time, every time a
procedure is activated, its names are bound to the same storage
locations.
Therefore values of local names are retained across activations of a procedure. That is, when control returns to a procedure
the values of the locals are the same as they were when control
left the last time.
From the type of a name, the compiler decides the amount of
storage for the name and at which the target code can nd the
data it operates on.
Stack Allocation:
All compilers for languages that use procedures, functions or
methods as units of user dened actions manage at least part of
their run-time memory as a stack.
Each time a procedure is called, space for its local variables
is pushed onto a stack, and when the procedure terminates, that
space is popped off the stack.
12/13/2012 5:14:33 PM
2.99
Calling sequences:
Procedures called are implemented in what is called as calling
sequence, which consists of code that allocates an activation
record on the stack and enters information into its elds.
A return sequence is similar to code to restore the state of
machine so the calling procedure can continue its execution
after the call.
The code in calling sequence is often divided between the calling procedure (caller) and the procedure it calls (callee).
When designing calling sequences and the layout of activation
records, the following principles are helpful:
Values communicated between caller and callee are generally
placed at the beginning of the callees activation record, so they
are as close as possible to the callers activation record.
Fixed length items are generally placed in the middle. Such
items typically include the control link, the access link, and the
machine status elds.
Items whose size may not be known early enough are placed
at the end of the activation record. The most common example
is dynamically sized array, where the value of one of the callees
parameters determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common
approach is to have it point to the end of xed-length elds in
the activation record. Fixed-length data can then be accessed by
xed offsets, known to the intermediate-code generator, relative
to the top-of-stack pointer.
callers
activation
record
callers
resp
ponsibility
callees
activation
record
callers
responsibility
control link
control link
top_
p sp
s
Fig: Division of tasks between caller and callee
12/13/2012 5:14:34 PM
2.100
The calling sequence and its division between caller and callee
are as follows.
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top_sp
into the callees activation record. The caller then increments
the top_sp to the respective positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
The callee places the return value next to the parameters.
Using the information in the machine-status eld, the callee
restores top_sp and other registers, and then branches to the
return address that the caller placed in the status eld.
Although top_sp has been decremented, the caller knows
where the return value is, relative to the current value of top_
sp; the caller therefore may use that value. Parameters and
returned values
Variable length data on stack:
The run-time memory management system must deal frequently
with the allocation of space for objects, the sizes of which are
not known at the compile time, but which are local to a procedure and thus may be allocated on the stack.
The reason to prefer placing objects on the stack is that we
avoid the expense of garbage collecting their space.
The same scheme works for objects of any type if they are
local to the procedure called and have a size that depends on the
parameters of the call.
Procedure p has three local arrays, whose sizes cannot be
determined at compile time.The storage for these arrays is not
part of the activation record for p.
Access to the data is through two pointers, top and top-sp.
Here the top marks the actual top of stack; it points the position
at which the next activation record will begin.
The second top-sp is used to nd local, xed-length elds of
the top activation record.
The code to reposition top and top-sp can be generated at compile time, in terms of sizes that will become known at run time.
12/13/2012 5:14:34 PM
2.101
control link
activation
n
re
ecord for p
pointer to A
pointer to B
pointer to C
array A
array B
arrays of p
array C
activation
ation reco
record for
procedure
ure q call
called by p
arrays of q
control link
top_sp
top
Heap Allocation:
Stack allocation strategy cannot be used if either of the following
is possible :
1. The values of local names must be retained when an
activation ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as
needed for activation records or other objects.
Pieces may be deallocated in any order, so over the time the
heap will consist of alternate areas that are free and in use.
The record for an activation of procedure r is retained when
the activation ends.
Therefore, the record for the new activation q(1 , 9) cannot
follow that for s physically.
If the retained activation record for r is deallocated, there will
be free space in the heap between the activation records for s
and q.
12/13/2012 5:14:34 PM
2.102
Position in the
activation tree
Activation records in the heap
s
s
r
q (1,9)
Remarks
Retained activation
record for r
control link
r
control link
q(1,9)
control link
13. (a) (i) A three-address statement is an abstract form of intermediate

code. In a compiler, these statements can be implemented as
records with elds for the operator and the operands.
Three such representations are:
Quadruples
Triples
Indirect triples
Quadruples:
A quadruple is a record structure with four elds, which are, op,
arg1, arg2 and result.
The op eld contains an internal code for the operator. The
three-address statement x : = y op z is represented by placing y in
arg1, z in arg2 and x in result.
The contents of elds arg1, arg2 and result are normally pointers to the symbol-table entries for the names represented by these
elds. If so, temporary names must be entered into the symbol
table as they are created.
Triples:
To avoid entering temporary names into the symbol table, we
might refer to a temporary value by the position of the statement
that computes it.
If we do so, three-address statements can be represented by
records with only three elds:op, arg1 and arg2.
12/13/2012 5:14:34 PM
2.103
The elds arg1 and arg2, for the arguments of op, are either
pointers to the symbol table or pointers into the triple structure
(for temporary values).
Since three elds are used, this intermediate code format is
known as triples.
Indirect Triples:
Another implementation of three-address code is that of listing
pointers to triples, rather than listing the triples themselves. This
implementation is called indirect triples.
Example: a:=b*- c+b*- c
Three address code is
t1:=umius c
t2:=t1*b
t3:=umius c
t4:=t3*b
t5:=t2+t4
a:=t5
Quadraple:
op
Arg1
Arg2 Result
uminus
Uminus
t3
t4
t2
t4
t5
:=
t3
t1
t1
t2
t3
Triples:
op
Arg 1
Arg2
Uminus
Uminus
(2)
(1)
(3)
assign
(4)
(0)
12/13/2012 5:14:35 PM
2.104
Indirect triple:
statement
op
Arg1
Arg 2
(0)
(14)
(14)
Uminus
(1)
(15)
(15)
(2)
(16)
(16)
Uminus
(3)
(17)
(17)
(16)
(4)
(18)
(18)
(15)
(17)
(5)
(19)
(19)
assign
(18)
(14)
(ii) Suppose that the context in which an assignment appears is given

by the following grammar.
PMD
M
D D ; D | id : T | proc id ; N D ; S
N
Nonterminal P becomes the new start symbol when these productions are added to those in the translation scheme shown
below.
Translation scheme to produce three-address code for
assignments
S id : = E
p : = lookup ( id.name);
if p nil then
emit( p : = E.place)
else error }
E E1 + E2
emit( E.place : = E1.place + E2.place ) }
E E1 * E2
emit( E.place : = E1.place * E2.place ) }
E E1
emit ( E.place : = uminus E1.place ) }
E ( E1 )
E id
{ p : = lookup ( id.name);
if p nil then
E.place : = p
else error }
12/13/2012 5:14:35 PM
2.105
(b) (i) Boolean expressions have two primary purposes. They are used
to compute logical values, but more often they are used as conditional expressions in statements that alter the ow of control,
such as if-then-else, or while-do statements.
Boolean expressions are composed of the boolean operators ( and,
or, and not ) applied to elements that are boolean variables or
relational expressions. Relational expressions are of the form E1
relop E2, where E1 and E2 are arithmetic expressions.
Here we consider boolean expressions generated by the
following grammar :
E E or E | E and E | not E | ( E ) | id relop id | true | false
Methods of Translating Boolean Expressions:
There are two principal methods of representing the value of a
boolean expression. They are :
To encode true and false numerically and to evaluate a boolean
expression analogously to an arithmetic expression. Often, 1
is used to denote true and 0 to denote false.
To implement boolean expressions by ow of control, that is,
representing the value of a boolean expression by a position
reached in a program. This method is particularly convenient
in implementing the boolean expressions in ow-of-control
statements, such as the if-then and while-do statements.
Numerical Representation
Here, 1 denotes true and 0 denotes false. Expressions will be
evaluated completely from left to right, in a manner similar to
arithmetic expressions.
For example :
The translation for a or b and not c is the three-address
sequence
t1 : = not c
t2 : = b and t1
t3 : = a or t2
A relational expression such as a < b is equivalent to the conditional statement if a < b then 1 else 0 which can be translated
into the three-address code sequence (again, we arbitrarily start
statement numbers at 100):
100 : if a < b goto 103
101 : t : = 0
102 : goto 104
12/13/2012 5:14:35 PM
2.106
103 : t : = 1
104 :
Translation scheme using a numerical representation for
booleans
E E1 or E2
emit( E.place : = E1.place or E2.place )
E E1 and E2
emit( E.place : = E1.place and E2.place ) }
E not E1
emit( E.place : = not E1.place ) }
E ( E1 )
{ E.place : = E1, place }
E id1 relop id2
emit( if id1.place relop.op id2.place goto
nextstat + 3);
emit( E.place : = 0 );
emit(goto nextstat +2);
E true
E false
(ii) The easiest way to implement the syntax-directed denitions for

boolean expressions is to use two passes. First, construct a syntax
tree for the input, and then walk the tree in depth-rst order,
computing the translations. The main problem with generating
code for Boolean expressions and ow-of-control statements in
a single pass is that during one single pass we may not know the
labels that control must go to at the time the jump statements
are generated. Hence, a series of branching statements with the
targets of the jumps left unspecied is generated. Each statement will be put on a list of goto statements whose labels will be
lled in when the proper label can be determined. We call this
subsequent lling in of labels backpatching.
To manipulate lists of labels, we use three functions :
1. makelist(i) creates a new list containing only i, an index into
the array of quadruples; makelist returns a pointer to the list it
has made.
2. merge(p1,p2) concatenates the lists pointed to by p1 and p2,
and returns a pointer to the concatenated list.
12/13/2012 5:14:35 PM
2.107
3. backpatch(p,i) inserts i as the target label for each of the

statements on the list pointed to by p.
Boolean Expressions:
We now construct a translation scheme suitable for producing
quadruples for Boolean expressions during bottom-up parsing.
The grammar we use is the following:
(1) E E1 or M E2
(2) | E1 and M E2
(3) | not E1
(4) | (E1)
(5) | id1 relop id2
(6) | true
(7) | false
(8) M
Synthesized attributes truelist and falselist of nonterminal E are
used to generate jumping code for boolean expressions. Incomplete jumps with unlled labels are placed on lists pointed to by
E.truelist and E.falselist.
Consider production E E1 and M E2. If E1 is false, then
E is also false, so the statements on E1.falselist become part of
E.falselist. If E1 is true, then we must next test E2, so the target
for the statements E1.truelist must be the beginning of the code
generated for E2. This target is obtained using marker nonterminal M.
Attribute M.quad records the number of the rst statement of
E2.code. With the production
M we associate the semantic action
The variable nextquad holds the index of the next quadruple to
follow. This value will be backpatched onto the E1.truelist when
we have seen the remainder of the production E E1 and M E2.
The translation scheme is as follows:
(1) E E1 or M E2
{ backpatch ( E1.falselist, M.quad);

E.truelist : = merge( E1.truelist, E2.truelist);
E.falselist : = E2.falselist }
(2) E E1 and M E2
{ backpatch ( E1.truelist, M.quad);

E.truelist : = E2.truelist;
E.falselist : = merge(E1.falselist, E2.falselist) }
12/13/2012 5:14:35 PM
2.108
(3) E not E1
{ E.truelist : = E1.falselist;
E.falselist : = E1.truelist; }
(4) E ( E1 )
{ E.truelist : = E1.truelist;
E.falselist : = E1.falselist; }
(5) E id1 relop id2
{ E.truelist : = makelist (nextquad);

E.falselist : = makelist(nextquad + 1);
emit(if id1.place relop.op id2.place goto_)
emit(goto_) }
(6) E true
{ E.truelist : = makelist(nextquad);
emit(goto_) }
(7) E false
{ E.falselist : = makelist(nextquad);
emit(goto_) }
(8) M
14. (a) (i) Write in detail about the issues in the design of a code generator.
The following issues arise during the code generation phase :
2. Target program
6. Evaluation order
1. Input to code generator:
The input to the code generation consists of the intermediate representation of the source program produced by front
end , together with information in the symbol table to determine run-time addresses of the data objects denoted by the
names in the intermediate representation.
Intermediate representation can be :
c. Virtual machine representation such as stack machine
code
Prior to code generation, the front end must be scanned,
parsed and translated into intermediate representation along
with necessary type checking. Therefore, input to code generation is assumed to be error-free.
12/13/2012 5:14:35 PM
2.109
2. Target program:
The output of the code generator is the target program. The
output may be :
Names in the source program are mapped to addresses of
data objects in run-time memory by the front end and code
generator.
Labels in three-address statements have to be converted to
addresses of instructions.
For example,
if i < j, a backward jump instruction with target address
equal to location of
code for quadruple i is generated.
if i > j, the jump is forward. We must store on a list for
quadruple i the location of the rst machine instruction
generated for quadruple j. When i is processed, the machine
locations for all instructions that forward jumps to i are
lled.
The instructions of target machine should be complete and
uniform.
Instruction speeds and machine idioms are important factors when efciency of target program is considered.
The quality of the generated code is determined by its speed
and size.
Instructions involving register operands are shorter and
faster than those involving operands in memory.
12/13/2012 5:14:35 PM
2.110
The use of registers is subdivided into two subproblems:

Register allocation the set of variables that will reside in
registers at a point in the program is selected.
D x, y
where, x dividend even register in even/odd register pair
y divisor
6. Evaluation order
The order in which the computations are performed can
affect the efciency of the target code. Some computation
orders require fewer registers to hold intermediate results
than others
(ii) Basic Blocks
A basic block is a sequence of consecutive statements in which
ow of control enters at the beginning and leaves at the end
without any halt or possibility of branching except at the end.
The following sequence of three-address statements forms a
basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
Basic Block Construction:
Algorithm: Partition into basic blocks
Input: A sequence of three-address statements
Output: A list of basic blocks with each three-address statement in exactly one block
Method:
1. We rst determine the set of leaders, the rst statements of
basic blocks. The rules we use are of the following:
a. The rst statement is a leader.
b. Any statement that is the target of a conditional or
unconditional goto is a leader.
12/13/2012 5:14:35 PM
2.111
c. Any statement that immediately follows a goto or

conditional goto statement is a leader.
2. For each leader, its basic block consists of the leader and all
statements up to but not including the next leader or the end
of the program.
(b) (i) A DAG for a basic block is a directed acyclic graph with the
following labels on nodes:
1. Leaves are labeled by unique identiers, either variable
names or constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identiers for
labels to store the computed values.
DAGs are useful data structures for implementing transformations on basic blocks.
It gives a picture of how the value computed by a statement is
used in subsequent statements.
It provides a good way of determining common sub expressions.
Algorithm for construction of DAG
Input: A basic block
Output: A DAG for the basic block containing the following
information:
1. A label for each node. For leaves, the label is an identier.
For interior nodes, an operator symbol.
2. For each node a list of attached identiers to hold the computed values.
Case (i) x : = y OP z
Case (ii) x : = OP y
Case (iii) x : = y
Method:
Step 1: If y is undened then create node(y).
If z is undened, create node(z) for case(i).
Step 2: For the case(i), create a node(OP) whose left child
is node(y) and right child is node(z). (Checking for
common sub expression). Let n be this node.
For case(ii), determine whether there is node(OP) with
one child node(y). If not create sucha node.
For case(iii), node n will be node(y).
Step 3: Delete x from the list of identiers for node(x). Append
x to the list of attached identiers for the node n found
in step 2 and set node(x) to n.
12/13/2012 5:14:35 PM
2.112
(ii) A code generator generates target code for a sequence of

three- address statements and effectively uses registers to store
operands of the statements.
For example: consider the three-address statement a := b+c
ADD Rj, Ri Cost = 1 // if Ri contains b and Rj contains c
(or)
ADD c, Ri
(or)
MOV c, Rj
ADD Rj, Ri
An address descriptor stores the location where the current
value of the name can be found at run time
{
}
{
If (operator =+)
}
12/13/2012 5:14:35 PM
2.113
Else
{
If (operator =+)
}
the result of the computation y opz should be stored.
2. Consult the address descriptor for y to determine y, the current location of y. Prefer the register for y if the value of y is
currently both in memory and a register. If the value of y is
not already in L, generate the instruction MOV y, L to place
a copy of y in L.
3. Generate the instruction OP z, L where z is a current location of z. Prefer a register to a memory location if z is in
15. (a) (i) A transformation of a program is called local if it can be performed by looking only at the statements in a basic block; otherwise, it is called global.
Many transformations can be performed at both the local and
global levels. Local transformations are usually performed rst.
program without changing the function it computes.
12/13/2012 5:14:35 PM
2.114

4. Constant folding
are common examples of such function-preserving transformations. The other transformations come up primarily when global
optimizations are performed.
An occurrence of an expression E is called a common subexpression if E was previously computed, and the values of
variables in E have not changed since the previous computation. We can avoid recomputing the expression if we can use the
previously computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1. And value of i is not been changed from
denition to use.
Copy Propagation:
Assignments of the form f : = g called copy statements, or copies
for short. The idea behind the copy-propagation transformation
is to use g for f, whenever possible after the copy statement
f: = g. Copy propagation means use of one variable instead of
another. This may not appear to be an improvement, but as we
shall see it gives us an opportunity to eliminate x.
For example:
x=Pi;
A=x*r*r;
12/13/2012 5:14:35 PM
2.115

A=Pi*r*r;
A variable is live at a point in a program if its value can be
used subsequently; otherwise, it is dead at that point. A related
idea is dead or useless code, statements that compute values that
never get used. While the programmer is unlikely to introduce
any dead code intentionally, it may appear as the result of previous transformations. An optimization can be done by eliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, if statement is dead code because this condition will
never get satised.
Constant folding:
We can eliminate both the test and printing from the object code.
More generally, deducing at compile time that the value of an
expression is a constant and using the constant instead is known
as constant folding.
One advantage of copy propagation is that it often turns the
copy statement into dead code.
For example,
a=1.570 thereby eliminating a division operation.
Loop Optimizations:
We now give a brief introduction to a very important place for
optimizations, namely loops, especially the inner loops where
programs tend to spend the bulk of their time. The running time
of a program may be improved if we decrease the number of
instructions in an inner loop, even if we increase the amount of
code outside that loop.
Induction-variable elimination, which we apply to replace
variables from inner loop.
12/13/2012 5:14:35 PM
2.116
Reduction in strength, which replaces and expensive operation

by a cheaper one, such as a multiplication by an addition.
Code Motion:
An important modication that decreases the amount of code in
a loop is code motion. This transformation takes an expression
that yields the same result independent of the number of times
a loop is executed ( a loop-invariant computation) and places
the expression before the loop. Note that the notion before the
loop assumes the existence of an entry for the loop. For example, evaluation of limit-2 is a loop-invariant computation in the
following while-statement:
while (i <= limit- 2) /* statement does not change limit*/
t= limit- 2;
while (i<=t) /* statement does not change limit or t */
of variable gets changed every time .it is either decremented or
For example:
B1
i:=i+1
t1:=4*j
t2:=a[t1]
if t2 <10 goto B1
in above code the values of i and t1 are in locked state. that is,
when value of i gets incremented by 1 then t1 gets incremented
by 4 . hence i and t4 are induction variables when there are two
or more induction variables in loop ,it may be possible to get rid
of all but one
Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine
than as a call to a exponentiation routine.
(ii) A statement-by-statement code-generations strategy often
produce target code that contains redundant instructions and
12/13/2012 5:14:35 PM
2.117
suboptimal constructs .The quality of such target code can

be improved by applying optimizing transformations to
the target program.
A simple but effective technique for improving the target code is
peephole optimization, a method for trying to improving the performance of the target program by examining a short sequence
of target instructions (called the peephole) and replacing these
instructions by a shorter or faster sequence, whenever possible.
Unreachable Code
MOV R0,a
MOV a,R0
we can eliminate the second instruction since x is already in
R0.
Unreachable Code:
Sum=0
If(sum)
Print (%d,sum);
The unnecessary jumps on jumps can be eliminated in either the
intermediate code or the target code by the following types of
peephole optimizations. We can replace the jump sequence
goto L1
.
L1: gotoL2
by the sequence
goto L2
.
L1: goto L2
12/13/2012 5:14:35 PM
2.118
algebraic identities occur frequently enough that it is worth considering implementing them .For example, statements such as
x := x+0
Or
x := x * 1
Reduction in strength replaces expensive operations by
equivalent cheaper ones on the target machine. Certain machine
routine. X2 X*X
i:=i+1.
i:=i+1 i++
i:=i- 1 i- (b) (i) One application of dominator information is in determining the
loops of a ow graph suitable
for improvement.
The properties of loops are
A loop must have a single entry point, called the header. This
entry point-dominates all nodes in the loop, or it would not be
the sole entry to the loop.
There must be at least one way to iterate the loop(i.e.)at least
one path back to the header.
One way to nd all the loops in a ow graph is to search for
edges in the ow graph whose heads dominate their tails. If ab
12/13/2012 5:14:35 PM
2.119
is an edge, b is the head and a is the tail. These types of edges are
called as back edges.
Example:
1
2
3
4
5
6
7
8
10
In the above graph,

74
4 DOM 7
10 7
7 DOM 10
43
83
9 1
The above edges will form loop in ow graph.
Given a back edge n d, we dene the natural loop of the
edge to be d plus the set of nodes that can reach n without going
through d. Node d is the header of the loop.
Algorithm: Constructing the natural loop of a back edge.
Input: A ow graph G and a back edge n d.
Output: The set loop consisting of all nodes in the natural loop
n d.
Method: Beginning with node n, we consider each node m*d
that we know is in loop, to make sure that ms predecessors are
also placed in loop. Each node in loop, except for d, is placed
once on stack, so its predecessors will be examined. Note that
because d is put in the loop initially, we never examine its predecessors, and thus nd only those nodes that reach n without
going through d.
Procedure insert(m);
if m is not in loop then begin
loop := loop U {m};
push m onto stack
end;
stack : = empty;
loop : = {d};
insert(n);
12/13/2012 5:14:35 PM
2.120
while stack is not empty do begin

pop m, the rst element of stack, off stack;
for each predecessor p of m do insert(p)
end
(ii) Code improving transformations:
Algorithms for performing the code improving transformations
rely on data-ow information. Here we consider common subexpression elimination, copy propagation and for eliminating
induction variables.
Global transformations are not substitute for local transformations; both must be performed.
Elimination of global common sub expressions:
The available expression allows us to determine if an expression
at point p in a ow graph is a common sub-expression. The following algorithm we can eliminating common subexpressions.
Algorithm: Global common sub expression elimination.
Input: A ow graph with available expression information.
Output: A ow graph after eliminating common subexpression.
Method: For every statement s of the form x := y+z such that
y+z is available at the beginning of block and neither y nor r z is
dened prior to statement s in that block, do the following.
1. To discover the evaluations of y+z that reach in the block
containing statement s
2. Create new variable u.
3. Replace each statement w: =y+z found in (1) by
u:=y+z
w:=u
4. Replace statement s by x:=u.
Let us apply this algorithm and perform global common subexpression elimination
Example:
1
t1: = 4*k
t2: = a[t1]
t5: = 4*k
t6: = a[t1]
Step 2 and 3
Step 4
now if we assign value to
common subexpression then,
(12): = 4*k
(15): = a[(12)]
m: = 4*k
t1: = m
t2: = a[t1]
(12)
t5: = m
t6: = a[t5]

(15)
t5: = (12)
t6: = (15)
12/13/2012 5:14:36 PM
2.121
Copy propagation:
The assignment in the form a=b is called copy statement. The
idea behind the copy propagation transformation is to use b for
a whenever possible after copy statement a:=b .
Algorithm: Copy propagation.
Input: a ow graph G, with ud-chains giving the denitions
reaching block B
Output: A graph after Appling copy propagation transformation.
Method: For each copy s: x: =y do the following:
1. Determine those uses of x that are reached by this denition
of namely, s: x: =y.
2. Determine whether for every use of x found in (1), s is in
c_in[B], where B is the block of this particular use, and
moreover, no denitions of x or y occur prior to this use of x
within B. Recall that if s is in c_in[B]then s is the only denition of x that reaches B.
3. If s meets the conditions of (2), then remove s and replace all
uses of x found in (1) by y.
Step 1 and 2
x:= t3
this is a copy
y statement
a[t1]:= t2
use
a[t4]:= x
y:= x + 3
a[t5]: = y
use
since value of t3 and x is not altered along the path from is denition we will replace x by t3 and then eliminate the copy statement.
x:=t3
a[t1]: = t2
a[t4]: = t3 Eliminating
a[t1]: = t2
a[t4]: = t3
y: = t3 + 3 copy statement y: = t3 + 3
a[t5]: = y
a[t5]: = y
Elimination of induction variable:

A variable x is called an induction variable of a loop L if every
time the variable x changes values, it is incremented or decremented by some constant.
12/13/2012 5:14:36 PM
2.122
For example i is an induction variable . for a for loop for i:=1 to

10 while eliminating induction variables rst of all we have to
identify all the induction variables generally induction variables
come in following forms
a:=i*b
a:=b*i
a:=ib
a:=bi
where b is a constant and i is an induction variables ,basic or
otherwise .
If b is a basic then a is in the family of j.the depends on the
denition of i.
Algorithm: Elimination of induction variables
Input: A loop L with reaching denition information, loop-invariant computation information and live variable information.
Output: A ow graph without induction variables
Method: Consider each basic induction variable i whose only
uses are to compute other induction variables in its family and
in conditional branches. Take some j in is family, preferably
one such that c and d in its triple are as simple as possible and
modify each test that i appears in to use j instead. We assume in
the following tat c is positive. A test of the form if i relop x goto
B, where x is not an induction variable, is replaced by
r := c*x /* r := x if c is 1. */
r := r+d /* omit if d is 0 */
if j relop r goto B
where, r is a new temporary.
The case if x relop i goto B is handled analogously. If there are
two induction variables i1 and i2 in the test if i1 relop i2 goto B,
then we check if both i1 and i2 can be replaced. The easy case
is when we have j1 with triple and j2 with triple, and c1=c2 and
d1=d2. Then, i1 relop i2 is equivalent to j1 relop j2.
Now, consider each induction variable j for which a statement j:
=s was introduced. First check that there can be no assignment
to s between the introduced statement j :=s and any use of j. In
the usual situation, j is used in the block in which it is dened,
simplifying this check; otherwise, reaching denitions information, plus some graph analysis is needed to implement the check.
Then replace all uses of j by uses of s and delete statement j: =s.
12/13/2012 5:14:36 PM
B.E./B.Tech. DEGREE EXAMINATION

NOV/DEC 2010
Principles of Compiler Design

Time: Three hours
Maximum: 100 Marks
Answer ALL questions

PART A (10 2 = 20 Marks)
1. What is language processing system?
2. What are the error recovery actions in a lexical analyzer?
3. Eliminate left recursion from the following grammar S-> (L) | a;
L-> L, S| S
4. What is CLR?
5. Translate the expression a (b+c) into three address code?
6. List out the three functions that are used to manipulate list of labels in
back patching?
7. What is constant folding?
8. What is DAG? What are the applications of DAG in compiler?
9. What is a ow graph?
10. What are the criteria used for code-improving transformations?
12/13/2012 5:14:36 PM
2.124
PART B (5 16 = 80 Marks)
11. (a) (i) Discuss about the input buffering scheme in lexical analyzer?
(ii) Construct a NFA using Thompsons construction algorithm for
the regular expression (a) (b)* abb (a| b) and convert it into DFA.
Or
(b) (i) Illustrate the compilers internal representation of the changes in
the source program, as translation progresses by considering the
translation of the statement A:=B + C * 50.
(ii) Construct a DFA directly from the regular expression (a| b)* abb
without constructing NFA?
12. (a) (i) Give the denition for FIRST(X) and FOLLOW(A) procedures
used in construction predictive parser?
(ii) What is an operator grammar? Draw the precedence graph for
the following table.
a
a
(
<
<
)
.
<
<
<
<
>
>
>
<
>
>
>
>
>
Or
(b) (i) Write a note on error recovery in predictive parsing?
(ii) Write the LR parsing algorithm. Check whether the grammar is
SLR(1) or not. Justify the answer with reasons.
S -> L = R| R;
L -> *R/id;
R-> L
13. (a) (i) What are the various data structures used for symbol table construction and explain any one in detail?
(ii) Let A be a 10 20 array with low 1 = low 2 = 1. Let w = 4.
Draw an annotated parse tree for the assignment statement X: =
A[y, z]. Give the sequence of three address statement generated.
Or
(b) How would you generated the intermediate code for the ow of control statements? Explain with examples?
12/13/2012 5:14:36 PM
2.125
14. (a) (i) Explain peephole optimization with example?

(ii) Explain the DAG representation of the basic block with an
example?
Or
(b) (i) What is a three address code? What are its types? How it is
implemented?
(ii) Construct the DAG for the following basic block
D : = B*C; E : = A + B; B : = B*C; A : = E D
15. (a) (i) Explain about Code generation algorithm.
(ii) Explain the various storage allocation strategies?
Or
(b) Why do we need code optimization? Explain the principal sources of
optimization?
12/13/2012 5:14:37 PM
Solutions
PART A
1. In language processing system, all preprocessed source programs to be
used as a source for generating an object program are scanned, checked
for errors and the optimized source programs are outputs in units of
translation.
2. The error recovery actions are
a. Deleting an extraneous character
b. Inserting a missing character
c. Replacing an incorrect character by a correct character
d. Transposing two adjacent characters
3. S->(L)/a
S->SL
L->,(SL)/
4. CLR Canonical LR
It is a grammar whose parsing table has no multiple dened entries.
A grammar for which an CLR parser can be constructed is said to be an
CLR grammar.
5. t1 = b+c
t2 = at1
6. The three functions are
a. makelist(i)
b. merge(p1, p2)
c. Backpatch(p, i)
7. Deducing at compile time that the value of an expression is a constant
and using that constant instead is known as Constant Folding.
8. Directed Acyclic Graph gives a picture of how the value computed by
each statement in a basic block is used in subsequent statement of the
block. Applications of DAG include
(i) To detect common sub expressions.
(ii) To determine which identiers have their values used in the block.
(iii) To determine which statements computed values are used outside
the block.
(iv) It can be used to reconstruct a simplied list of quadruples.
12/13/2012 5:14:37 PM
2.127
9. Flow graph is a directed graph in which the directed edges indicate ow

of control, nodes represent basic block. A ow graph shows the directed
relationship between the basic blocks and their successors.
10. The transformations provided by an optimizing compiler should have
the following properties.
(i) Transformation must provide meaning of programs.
(ii) A transformation must, on the average speed up programs by a
measureable amount.
(iii) A transformation must be worth the effort.
PART B
11. (a) (i) This technique explains the efciency issues concerned with
the buffering of input. For many source languages, there are
times when the lexical analyser needs to look ahead several
characters beyond the lexeme for a pattern before a match can
be announced. Since large amount of time can be consumed
moving characters, specialized buffering techniques have been
developed to reduce the amount of overhead required to process
an input character.
The principle of a buffering scheme is outlined as follows.
Let us consider a buffer divided into two N character halves.
: : : E :
:= : M : * C: * :
* : 2 : : eof
Lexeme beginning
Forward
Where N is the number of characters on one disk block.

We read N input characters into each half of the buffer
with one system read command, rather than invoking a need
command for each input character. If fewer than N characters
remain in the input, then a special character eof is read into the
buffer after the input characters.
Two pointers to the input buffer are maintained. The string
of characters between the two pointers is the current lexeme.
Initially both pointers point to the rst character of the next lexeme to be found. 1. Forward pointer scans ahead until a match
for a pattern is found. Once the next lexeme is determined,
forward pointer is set to the character at its right end. After
the lexeme is processed, both pointers are set to the character
12/13/2012 5:14:37 PM
2.128
immediately part the lexeme. Comments and white spaces can

be treated as patterns that yield no token.
This buffering scheme works well most of the time, but with
it amount of look ahead is limited, and this limited look ahead
may make it impossible to recognize tokens in situations where
the distance that the forward pointer must travel is more than
the length of the buffer.
11. (a) (ii)
a/b
abb
(a/b)*
(a/b)*abb(a|b)* and
2
0
1
6
100
1
1
12/13/2012 5:14:37 PM
2.129
e-closure(0) ={0,1,2,4,7}
.(A)
a-trans on A = {3,8}
b-tans on A = {5}
e-closure (3,8) = {0,1,2,3,4,6,7,8}
..(B)
a-trans on B = {3,8}
b-tans on B = {5,9}
e-closure (5) = {1,2,3,4,5,7}
..(C)
a-trans on C = {3,8}
b-tans on C = {5}
e-closure (5,9) = {1,2,3,4,5,6,7,9}
..(D)
a-trans on D = {3,8}
b-tans on D = {5}
e-closure (5,10) = {1,2,3,4,5,6,7,10,11,12,13,14,17}
..(E)
a-trans on E = {3,8}
b-tans on E = {5,10}
e-closure (3,8,13) = {1,2,3,4,6,7,8,11,12,13,14,15,16,17} ..(F)
a-trans on F = {3,8,13}
b-tans on F = {5,9,15}
e-closure (3,8) = {1,2,3,4,6,7,11,12,13,14,15,16,17} ....(G)
a-trans on G = {3,8,13}
b-tans on G = {5,15}
e-closure (3,8) = {1,2,3,4,6,7,9,11,12,13,14,15,16,17} ..(H)
a-trans on H = {3,8,13}
b-tans on H = {5,15}
The equivalent DFA transition table is
Input
States
12/13/2012 5:14:38 PM
2.130
A : = B + C50
11. (b) (i)

Sequence of tokens
Lexical analyzer
A, = , B, +, C, * 50
50
Id2
Syntax analyzer
Semantic analyzer
Intermediate code
generation
Code optimization
Temp 1 = inttoreal (50)

Temp 2 = id3 * temp l
Temp 3 = id2 + temp 2
id1 = Temp 3
Temp 1 = id3*inttoreal (50)
id3 = id2 + temp 1
Code generation
MOVFid3, R1
MULF # 500, R1
MOVF id2, R2
ADDF R2, R1
MOVF R1, id1
Machine or Assembly language code
11. (b) (ii) The syntax tree for (a/b)* abb# is given as
O
#
6
O
b
5
O
b
4
O
a
3
a
1
b
2
12/13/2012 5:14:38 PM
2.131
The function rstpos and lastpos are showns as

{1,2,3} o {6}
{6} # {6}
{1,2,3} o {5}
{5} b {5}
{1,2,3} o {4}
{4} b {4}
{1,2,3} o {3}
{1,2} {1,2}
{1,2}
{1} a {1}
{3} a {3}
{1,2}
{2} b {2}
The function followpos is shown as

Node
followpos
1
2
3
4
5
6
{1,2,3}
{1,2,3}
{4}
{5}
{6}
Firstpos of the root is {1,2,3}. Let this set be A. Let B =

followpos (1) U followpos (3) = {1,2,3,4}
Let C = followpos (1) U followpos (4) = {1,2,3,5}
D = followpos (1) U followpos (5) = {1,2,3,6}
Dtran [A, a] = B
Dtran [C, a] = B
Dtran [A, b] = A
Dtran [C, b] = D
Dtran [B, a] = B
Dtran [D, a] = B
Dtran [B, b] = C
Dtran [D, b] = A
Hence the resulting DFA is
b
b
Start
a
a
12/13/2012 5:14:39 PM
2.132
12. (a) (i) The construction of a predictive parser is aided by two functions FIRST & FOLLOW associated with a grammar.
If X is any string of grammar symbols, then FIRST (X) be
the set of terminals that begin the strings derived from X. If X
then is also in FIRST (X)
FOLLOW(A) is dened as for nonterminal A, to be the set
of terminals a that can appear immediately to the right of A
in some sentential form, i.e set of terminals a, such that there
exists a derivation of the form S A a for some and .
12. (a) (ii)
a
<
<
)
.
<
<
<
<
>
>
>
<
>
>
>
>
>
The grammar that has two adjacent nonterminal is called an

operator grammar. For example consider the grammar. For
example consider the grammar
E
A
EAE|(E)|-E|id
+|-|*|/|
Is not an operator, because the right side EAE has three consecutive nonterminals. If we substitute for A, we obtain the
operator grammar as
E
E+E|E-E|E*E|E/E|E E| (E)|-E|id
12/13/2012 5:14:39 PM
fa
fc
f)
f,
fs
2.133
ga
g(
g)
g,
gs
12. (b) (i) An error is detected during predictive parsing when the terminal on top of the stack does not match the next symbol or when
nonterminal A is on top of the stack, a is the next input symbol
and the parsing table entry M[A, a] is empty.
Panic mode recovery is based on the idea of skipping on the
until a token in a selected set of Synchronizing tokens appears.
Its effectiveness depends on the choice of Synchronizing set.
The sets should be chosen so that the parser recovers quickly
from errors that are likely to occur.
Parse level recovery can be implemented in predictive parsen
by lling up the blank entries in the predictive parse table with
pointer to error handling routines. These routines can insert,
modify, or delete symbols in the input.
12. (b) (ii) LR-parsing algorithm.
INPUT: An input string w and an LR-parsing table with functions ACTION and GOTO for a grammar G.
OUTPUT: If w is in L(G), the reduction steps of a bottom-up
parse for w; otherwise, an error indication.
METHOD: Initially, the parser has so on its stack, where so
is the initial state, and w$ in the input buffer. The parser then
executes the program.
12/13/2012 5:14:39 PM
2.134
let a be the rst symbol of w$;

while(1)
{ /* repeat forever */
let s be the state on top of the stack;
if(ACTION[S,a] = shift t )
{
push t onto the stack;
let a be the next input symbol;
} else
if(ACTION[S,a] = reduce A +,O)
{
pop || symbols off the stack;
let state t now be on top of the stack;
push GO TO[t,A] onto the stack;
output the production A->;
}
else if(ACTION[S,a] = accept)break;
/* parsing is done */
else call error-recovery routine;
}
The grammar is
S -> L = R
S->R
L -> *R
L->id
R-> L
Augmented Grammar is
S->S
S -> L = R
S->R
L -> *R
L->id
R-> L
Canonical Collection of
LR(0) item is
I0: S .S
S .L = R
S.R
L .* R
L . id
R.L
12/13/2012 5:14:39 PM
2.135
L .id
I5: L id.
I6: S L = .R
R .L
L .* R
L .id
I7: L *R.
I1: S S.
I2: S L. = R
R L.
I3: S R.
I4: L *.R
R .L
L .*R
I8: RL.
I9: S L = R.
The SLR Parsing Table is
State
Action
=
Id
S4
S5
1
2
S6, r5
R5
R2
S4
S5
R4
7
R4
S4
S5
R3
R3
R5
R5
Accept
3
5
GOTO
R1
If an SLR parsing table for a grammar does not have multiple

entries in any cell, then the grammar is unambiguous.
Every SLR grammars are unambiguous but all unambiguous
grammars are not in SLR.
The entry action[2,=] is multiply dened, contains both shift
and reduce action. Thus the grammar is not ambiguous but not
in SLR.
12/13/2012 5:14:39 PM
2.136
13. (a) (i) A data structure called a Symbol table is generally used to store
information about various source language constructs. The
information is collected by the analysis phases of the compiler
and used by the synthesis phases to generate the target code.
The data structure for a particular implementation of a symbol table is shown as
Lexptr
Attributes
Token
Div
Mod
Id
I
Eos M
Eos
Eos
Array Lexemes
A xed amount of space may not be large enough to hold
a very long identier and may be wastefully large for a short
identier.
In the above gure, a separate array lexemes hold the character string forming an identier. The string is terminated by
an end of string (EOS), that may not appear in identiers. Each
entry in the symbol table array symtable is a record consisting
of two elds.
(1) Lexptr pointing to the beginning of lexeme
(2) Token
In the above representation, 0th entry is left empty, because
lookup return 0 to indicate that there is no entry for a string.
The 1st and 2nd entries are for keywords div and mod. The
3rd and 4th entries are for identier count and i.
12/13/2012 5:14:39 PM
2.137
13. (a) (ii) The annotated parse tree for the assignment statement X =
A[y, z] is given as
A
L.Place = x
L.offset = null
E.place = t4
L.Place = t2
L.offset = t3
Elist.place = t1
Elist.ndim = 2
Elist.array = A
Elist.place = t1
Elist.ndim = 1
Elist.array = A
E.place = y
E.place = z
L.Place = z
L.offset = null
L.Place = y
L.offset = null
The three address statements generated are

t1 = y * 20
t1 = t1 + z
t2 = c
12/13/2012 5:14:40 PM
2.138
t3 = 4 * t1
t4 = t2 [t1]
x = t4
13. (b)
Refer Nov/Dec 2009 - 13(b).
14. (a) (i) Refer May/Jun 2009 - 14(b).

14. (a) (ii) Directed Acyclic graphs are useful data structures for implementing transformations on basic blocks. A DAG gives a
picture of how the value computed by each statement in a
basic block is used in subsequent statements of the block.
Constructing a DAG from three address statements is a
good way of determining common sub expressions within
a block, determining which names are used inside the block
but evaluated outside the block and determining which statements of the block could have their computed values outside
the block.
A DAG for a basic block is a directed Acyclic Graph with the
following labels on nodes.
(1) Leaves are labeled by unique identiers, either variable
names or constants.
(2) Interior nodes are labeled by an operator symbol.
(3) Nodes are also optionally given a sequence of identiers
for labels. The intention is that interior nodes represent
computed values and the identiers labeling a node are
deemed to have that value.
Each node of a ow graph can be represented by a DAG,
since each node of the owgraph stands for a basic block.
Consider a three address statement
(1) t1 = 4 * i
(2) t2 = a[t1]
(3) t3 = 4 * i
(4) t4 = b[t3]
(5) t5 = t2 * t4
(6) t6 = prod + t6
(7) prod = t6
(8) t7 = i + 1
(9) i = 4 * t7
(10) if i<=20 goto (1)
The DAG is constructed as
12/13/2012 5:14:40 PM
t6, prod
<=
+
[]
t2
20
t7, i
t4
[]
t5
prod
2.139
t1, t3
14. (b) (i) Refer Nov/Dec 2009 - 13(a).

14. (b) (ii)
A
+A
D,B
15. (a) (i) Refer May/Jun 2009 - 14(a).

15. (a) (ii) Refer Nov/Dec 2009 - 14(a).
15. (b)
A compiler optimization must preserve the semantics of the original program. A transformation of a program is local if it can be performed by looking at the statements in the basic block, otherwise
it is called global. Many transformations can be performed at both
local and global levels.
(1) Function Preserving Transformations
program without changing the function it computes. Common
12/13/2012 5:14:40 PM
2.140
sub expression elimination, copy propagation, dead-code elimination, and constant folding are common examples of functionpreserving transformations.
a. Common Sub expression Elimination
An occurrence of an expression E is called a common subexpression if E was previously computed, and the values of
variables in E have not changes since the previous computation. We can avoid recomputing the expression if we can use
the previously computed value.
For example, consider the following block of statements
t1 = 4 * i
t2 = a[t1]
t3 = 4 * j
t4 = 4 * i
t5 = n
t6 = b[t4] + t5
The above code after optimization using common sub
expression elimination is
t1 = 4 * i
t2 = a[t1]
t3 = 4 * j
t5 = n
t6 = b[t4] + t5
b. Copy Propagation
Using one variable instead of another, is called as copy
propagation. The idea behind the copy propagation transformation is to use g for t, wherever possible after the copy
statement f = g. for example, consider the following block of
statements.
x = t3
a[t6] = t5
a[t4] = x
copy propagation yields
x = t3
a[t6] = t5
a[t4] = t5
12/13/2012 5:14:41 PM
2.141
c. Dead code elimination

A variable is live at point in a program if its value can be
used subsequently, otherwise it is dead at that point. Statements that compute values that never get used is dead or
useless code. For example, consider the following code
i=5
j=1
while(j < 5)
{
j = j+1
}
In the above block statement I = 5 is a dead statement or useless statement, and it can be eliminated.
Another optimization is constant folding, deducing at compile time that the value of an expression is a constant and
using the constant instead is known as constant folding.
(2) Loop optimization
Bulk of the execution time is spend on the inner loop of the
program. 9010 rule states that 90% of execution time is spend
on 10% of code and i.e the inner loop. The running time of a
program may be improved if we decrease the number of instructions in an inner loop, even if we increase the amount of code
outside the loop.
Three techniques are important for loop optimization. They
are a. code motion, b. induction variable elimination and c.
reduction in strength.
a. Code motion
An important modication that decreases the amount of
code in a loop is code motion. This transformation takes an
expression that yields the same result independent of the
number of times a loop is executed and places the expression before the loop. For example,
While(i<=limit-2)
{
sum= sum + a[i]
}
t = limit -2
12/13/2012 5:14:41 PM
2.142
while(i<=t)
{
sum = sum + a[i]
}
b. Induction variable elimination
Consider the block of code
j = j1
t4 = 4 * j
t5 = a[t4]
if t5 > v i =10
In the above the values of j and t4 remain in a lock step. Every time the value j decreases by 1, t4 decreases by 4 because
4 *j is assigned to j. such identiers are called induction
variables.
When there are two or more induction variables in a loop,
it may be possible to get rid of all but one, by the process of
induction variable elimination.
c. Reduction in strength
Reduction in strength replaces expensive operation by
equivalent cheaper ones on the target machine. For example
X2 is invariably cheaper to implement X * X than as a call to
an exponentiation routine.
12/13/2012 5:14:41 PM

APRIL/MAY 2010

Time: Three hours
Maximum: 100 Marks

1. Differentiate compiler and interpreter?
2. Write short notes on buffer pair?
3. Construct a parse tree of (a + b)* c for the grammar E-> E+E/E*E/(E)/id.
4. Eliminate immediate left recursion for the following grammar E-> E +
T/T, T -> T->T*F/F, F-> (E)/id.
5. Write short notes on global data ow analysis?
6. Dene back patching with an example?
7. Give syntax directed translation for the following statement call
p1(int a, int b).
8. How can you nd the leaders in basic block?
9. Dene code motion?
10. Dene basic block and ow graph?
12/13/2012 5:14:41 PM
2.144
11. (a) (i) Explain the phases of compiler, with the neat schematic.
(ii) Write short notes on compiler construction tools?
Or
(b) (i) Explain grouping of phases?
(ii) Explain specication of tokens?
12. (a) Find the SLR parsing table for the given grammar and parse the
sentence (a + b)*c E -> E/ E*E/ (E)/id.
Or
(b) Find the predictive parser for the given grammer and parse the
sentence (a + b)*c E-> E+T/T, T-> T*F/F, F-> (E)/id.
13. (a) Generate intermediate code for the following code segment along
with the required syntax directed translation scheme
(i) if (a > b) x = a + b
else
x=ab
where a & x are of real and b of int type data
(ii) int a,b;
oat c;
a= 10;
Switch (a)
{case 10: c = 1;
Case 20 : c = 2;
}
Or
(b) (i) Generate intermediate code for the following code segment along
with the required syntax directed translation scheme:
i = 1; s = 0;
while (i<= 10)
s = s + a[i] [i] [i]
i = i+1
(ii) Write short notes on backpatching?
12/13/2012 5:14:41 PM
2.145
14. (a) (i) Explain the various issues in the design of code generation?
(ii) Explain code generation phase with simple code generation
algorithm?
Or
(b) (i) Generate DAG representation of the following code and list out
the applications of DAG representation:
i = 1; s = 0;
while (i< = 10)
s = s + a[i] [i]
i = i+1
(ii) Write short notes on next-use information with suitable
example?
15. (a) (i) Explain principle sources of optimization?
(ii) Write short notes on; Storage organization, Parameter Passing.
Or
(b) (i) Optimize the following code using various optimization
technique:
i = 1; s = 0;
for(i= 1; i< = 3;i++)
for ( j = 1; j < = 3;j+ +)
c[i][j] = c[i][j] + a[i][j] + b[i][ j]
(ii) Write short notes on access to non-local names?
12/13/2012 5:14:41 PM
Solutions
PART A
1.
Compiler
Interpreter
a. Intermediate code is low level Intermediate code is highlevel

b. Less secure
More secure
c. Translates the whole program Translates sentence by sentence
2. Reading the input character by character from the secondary storage is
costly. A block of data is read into a buffer and then scanned by the lexical
analyzer. Storing a block of input data in buffer to avoid costly access to
secondary storage each time.
3. It describes the syntactic structure of the input. Terminal nodes represent
the token and the interior nodes represent the expression.
E
E
E
id
+
E
E
id
id
4. E->TE
E-> + TE /
T->FT
T->*FT/
F->(E)
F->id
5. Examination of the entire program to suggest optimization is called
global data ow analysis. In data ow analysis, the analysis is made on
the data ow. Data ow analysis determines the information regarding
the denition and use of the data in the program.
6. Refer Nov/Dec 2009 - Q. No. 5.
7. Left p1 = add
t1 = a + b
c = t1
Return c
12/13/2012 5:14:41 PM
2.147
8. The rst statement in a basic block is a leader.

Any statement which is the target of a conditional or unconditional
goto is a leader.
Any statement which immediately follows a conditional goto is a
leader.
9. A modication in the code that decreases the amount of code in a loop is
called code motion. Here the loop invariants are removed from the loop
and placed before it.
10. Basic block is a sequence of consecutive statements which may be
entered only at the beginning and when entered are executed in sequence
without halting or branching.
Flow graph shows the directed relationship between the basic blocks
and their successors.
PART B
11. (a) (i) Refer Nov/Dec 2009 - 11(a) (i).
11. (a) (ii) Refer Nov/Dec 2009 - 11(a) (ii).
11. (b) (i) The phases of compiler can be grouped together to form front
end and back end.
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Front end
Code Optimizer
Code Generator
Back end
Front end consists of those phases that depend primarily on the

source language and are largely independent of the machine.
These normally include lexical, syntax, semantic analysis and
the creation of symbol table. Some amount of code optimization can also be done at the front end.
The back end includes those portions of the compiler that
depend on the target machine and independent on source
language. It includes code optimization and code generation.
A number of phases can be grouped into a pass, so that their
activities can be interleaved together in a pass. For example,
lexical analysis, syntax analysis, semantic analysis and intermediate code generation might be grouped into one pass. It is
12/13/2012 5:14:41 PM
2.148
desirable to have relatively few passes, since it takes time to

read and write intermediate les. The following are the effects
of reducing the number of passes.
(i) On reducing the number of passes, the entire information
of the pass has to be stored in the temporary memory.
(ii) When the number of passes is reduced, the time taken to
read and write intermediate les to or from the disk can be
reduced.
(iii) Related phases should only be grouped together.
(iv) When passes have more coupling among them, it is advantageous to group them together.
(ii) Regular expression is used to specify tokens. A token is recognized when a pattern is matched by some regular expression.
Strings and Languages
String is a collection of nite sequence of alphabets, where
a. The length of the string is written by |S|
b. The empty string is denoted by
c. The empty set of strings is denoted by
Term
Meaning
Postx of S
A string obtained by removing zero or more

trailing symbols of string S
Example: Ram is a prex of Ramesh
Sufx of S
A string obtained by deleting a prex and

sufx from a string S
Example: sign is a sufx of Design
Substring of S A string obtained by deleting a prex and

sufx from a String S
Example: dam is a substring of fundamental.
The term language denotes any set of strings over some alphabets. Various operations can be performed on the language. The
language can be combined by using the operations described
below. If L & M are languages then
12/13/2012 5:14:41 PM
Operation
Dention
Union of L & M
L U M = { S/S is in L or S is
in M}
Concatenation of L & M
(LM)
LM = {St/S is in L & t is in M}
L* =
Kleene Closure of L (L* )

Positive Closure of L (L+)
12. (a)
2.149
Li
i=0
L* denotes zero or more

concatenation of L
L+ denotes one or more
concatenation of L
Solution
E->E+E
E->E*E
E->(E)
E->id
Step 1:
The canonical collection of a set of item with augmented grammar
is
I 0:
E E
E E + E
E E * E
E (E)
E id
E E.
I1:
E E + E
E E * E
I2:
E (E)
E E + E
E E * E
E (E)
E id
I3:
E id
12/13/2012 5:14:41 PM
2.150
E E + E
E E + E
E E * E
E (E)
E id
E E * E
E E + E
E E * E
E (E)
E id
E (E)
E E + E
E E * E
E E+ E
E E. + E
E E * E
E E *E
E E + E
EE*E
E (E)
I4:
I5:
I6:
I7:
I8:
I9:
Step 2:
Construction of SLR Parsing table

ACTION
STATE
0
1
2
3
4
5
6
7
8
9
id
s3
s4
s5
r4
r4
s3
(
s2
GOTO
$
acc
s2
s3
s3
6
r4
r4
s2
s2
s4
r1
r2
r3
s5
s5
r2
r3
E
1
7
8
s9
r1
r2
r3
r1
r2
r3
12/13/2012 5:14:41 PM
2.151
Step 3:
Parsing of the input string (a + b)*c
Stack
Input
Action
0
0(2
0(2a
0(2E1
0(2E1 +
0(2E1 + 4
0(2E1 + 4b
0(2E1 + 4E
0(2E1 + 4E)
0(2E)
0E1
0E1*5
0E1*5c
0E1*E8
0E
(a + b)*c $
a + b)*c $
+ b)*c $
+ b)*c $
b)*c $
b)*c $
)*c $
)*c $
*c $
*c $
*c $
c$
$
$
$
Shift
Shift
Reduce by E->id
Shift
Shift
Shift
Reduce by E->id
Shift
Reduce by E->E+E
Reduce by E->(E)
Shift
Shift
Reduce by E->id
Reduce by E->E * E
12. (b)
Refer May/Jun 2009 - 12(b).
13. (a)
(i)
Solution:
100
if a < b goto 104
101
t1 = a b
102
x = t1
103
goto 106
104
t1 = a + b
105
x = t2
106
13. (a) (ii) Solution:

The three address code is given as
Goto test
L1:
t1 = 1
c = t1
goto test
L2:
t2 = 2
c = t2
goto test
test:
if (a == 10) goto L1
if (a == 20) goto L2
12/13/2012 5:14:42 PM
2.152
13. (b) (i) Solution:

Let m, n, r, w be the number of rows, columns, plane and width
respectively.
The three address code is given as
i=1
s=0
t2 = r i
t3 = n t2
t4 = t2 + t3
t5 = t4 + i
t6 = t5 w
t7 = a[t6]
t8 = s + t7
s = t8
t9 = i + 1
i = t9
13. (b) (ii) When we generate goto statements in three address code, we do
not know what label to use due to forward jumps. Backpatching
is the technique of generating a series of branching statements
in the target of jumps. The label is temporarily left unspecied
and later ll with proper label by using a list of goto statements
is called Backpatching.
If we print code to a le immediately, lling in the labels is
a lot more trouble more than its worth. A better implementation is to store, three address code is an array and buffer the
code, printing it when you reach paints where the code can
be emitted.
For Backpatching to work, a list of three address statements
that need to be completed with the same label is maintained.
The list that are maintained are
(1) Makelist(i)
(2) Merge(L1, L2)

(3) Backpatch
(L, Label)
(4) Nextquad
creates a new list containing

only I, which is an index into
the array of quadruples.
concatenates the lists L1 & L2.
inserts label as the target for
the statements in the List L.
gives the index of the next quadruple to be generated.
14. (a) (i) Refer May/Jun 2009 - 14(b).
12/13/2012 5:14:42 PM
2.153
14. (a) (ii) Refer May/Jun 2009 - 14(a).

14. (b) (i)
DAG representation
=
<=
S
S
10
[][]
Applications of DAG
(1) Common sub expressions can automatically be detected
(2) Identifers which have their values used in their block can be
determined.
(3) The statements which compute values that could be used
outside the block can be determined.
(4) Bayesian networks.
14. (b) (ii) If the name in a register is no longer needed, then the register can be assigned to solve other name. The idea of keeping
a name in storage only if it will be used subsequently can be
applied in a number of contexts.
The use of name in a three address statement is dened as
follows. Suppose three address statement i assigns a value to x.
if statement j has x an operand and control ow from i to j along
the path that has no intervening assignments to x, then we say
statement j uses the value of x computed at i.
The algorithm to determine next uses makes a backward pass
over each basic block. Having found the end of the basic block,
we scan backwards to the beginning, recording for each name
x, whether x has a next use in the block, and if not, whether it
is live on exit from the block. If the data ow analysis has been
done, we know which names are live on exit from each block.
If no live variable analysis has been done, it is assumed that
all non-temporary variables are live on exit. If the algorithms
generating intermediate code or optimizing the code permit
certain temporaries to be used across blocks, these too must be
considered live.
12/13/2012 5:14:42 PM
2.154
Suppose we reach three address statements i: x = y op z, in

our backward scan, we do the following
(1) Attach to the statement i, the information found in the
symbol table regarding the next use and liveness of x, y and
z.
(2) In the symbol table, set x to not live and no next use.
(3) In the symbol table, set y and z to live and the next uses of
y and z to i.
If three address statements of the form x: = y or x: = op y, the
steps are the same as above ignoring z.
15. (a) (i) Refer Nov/Dec 2008 - 15(b).
(ii) Storage Organization
The organization of runtime storage is as follows
a) Subdivision of runtime memory
The compiler demands for block of memory to operating
system. The compiler utilizes this block of memory for executing the compiled program. This block of memory is called
run time storage. The run time storage is divided to hold
i) The generated target code
ii) Data objects
iii) Information to keep track of procedure activations
The size of the generated code is xed at compile time. So
the compiler place it in a statically determined area, and
place in the low end of memory. Similarly the size of the
data objects may also be known at compile time. A separate
area of run time memory called a heap, holds all other information. The storage for such data is taken from the heap.
The size of the stack and the heap can change as the program
executes. The subdivision of runtime memory is shown as
Code
Static data
Stack
Heap
12/13/2012 5:14:42 PM
2.155
b) Activation Record
Information needed by a single execution of a procedure is
managed using a contiguous block of storage called activation record or frame. The activation record of a procedure is
pushed on runtime stack when a procedure is called and pop
off activation record when control returns to the callee. The
elds in the activation record are shown as
Returned Value
Actual Parameters
Control Link (optional)
Access Link (optional)
Saved Machine Status
Local Variables
Temporaries
The purpose of elds of an activation record is as follows
(1) Temporary values such as those arising in evaluation of
expressions are stored in the eld for temporaries.
(2) The eld for local data holds data that is local to an
execution of a procedure.
(3) Saved machine status holds information about the state
of machine just before the procedure is called.
(4) Optional Access link refers to the non-local data in
other activation record.
(5) Optional Control link points to the activation record of
the callee.
(6) Actual parameters is used by the calling procedure to
supply parameters to the called procedure.
(7) Returned value is used to store the result of function
call.
The size of each of these elds can be determined at the time
a procedure is called.
c) Compile time layout of local data
The amount of storage needed for a name is determined from
its type. An elementary data type such as character, integer
or real can usually be stored in an integral number of bytes.
Storage for an aggregate such as an array or record must be
12/13/2012 5:14:42 PM
2.156
large enough to hold all its components. For easy access to

components, storage for aggregates is typically allocated in
one contiguous block of bytes.
The storage layout of data objects is strongly inuenced by
the addressing constraints of the target machine. Although
an array of ten characters needs only enough bytes to hold
ten characters, a compiler may therefore allocate 12 bytes
leaving 2 bytes unused. Space left unused due to alignment
considerations is refered as padding.
Parameter Passing
When one procedure calls another, the usual method of communication between them is through non local names and through
parameters of called procedure. Several common methods for
associating actual and formal parameters are
a.
b.
c.
d.
Call by value
Call by reference
Copy restore
Call by name
a. Call by value
It is the simplest method of passing parameter. The actual
parameters are evaluated and their r-values are passed to
the called procedure. Call by value can be implemented as
follows
i) A formal parameter is treated like a local name, so the
storage for formals is in the activation record of the
called procedure.
ii) The caller evaluates the actual parameters and places
their r-values in the storage for formals.
In call by value, the operations on formal parameters do not
affect values in the activation record of the caller.
b. Call by Reference
When parameters are passed by reference, the caller passes
to the called procedure, a pointer to the storage address of
each actual parameter.
i) If the actual parameter is a name or an expression having an l-value, then that l-value itself is passed.
ii) If the actual parameter is an expression, then the expression is evaluated in a new location and the address of
that location is passed.
12/13/2012 5:14:42 PM
2.157
c. Copy Restore
A hybrid between Call by value and Call by Reference is copy
restore. It is also known as copy in copy out or value result.
i) The calling procedure calculates the value of actual
parameter and it is then copied to the activation record
for the called procedure.
ii) During execution of called procedure, the actual
parameter value is not affected.
iii) If the actual parameter has l-value, then at return
the value of formal parameter is copied to actual
parameter.
d. Call by name
Call by name is traditionally dened by the copy rule, which
is
i) The procedure is treated like macro. The procedure
body is substituted for call in.
ii) The actual parameters are surrounded by parenthesis to
preserve their integrity.
iii) The local names of called procedures are kept distinct
from the names of the calling procedure.
15. (b) (i) 3-address code:
i=1
s=0
t1 = 4*i
t2 = 4*j
t3 = add(a)-4
t4 = t3[t1][t2]
t5 = add(b)-4
t6 = t5[t1][t2]
t7 = t4 + t6
t8 = add(c) -4
t9 = t8[t1][t2]
t10 = t 9 + t7
i=i+1
j=j+1
if(i < 3) goto B2
if( j < 3) goto B2
Un optimized code:
i=1
s=0
B1
12/13/2012 5:14:42 PM
2.158
t1 = 4*i
t2 = 4*j
t3 = add(a)-4
t4 = t3[t1][t2]
t5 = add(b)-4
t6 = t5[t1][t2]
t7 = t4 + t6
t8 = add(c)-4
t9 = t8[t1][t2]
t10 = t9 + t7
i=i+1
j=j+1
if(i < 3) goto B2
if( j < 3) goto B2
B2
Optimized code: (using loop invariant computation)

i=1
s=0
B1
t3 = add(a)-4
t5 = add(b)-4
t8 = add(c) -4
B1
t1 = 4*i
t2 = 4*j
t4 = t3[t1][t2]
t6 = t5[t1][t2]
t7 = t4 + t6
t9 = t8[t1][t2]
t10 = t9 + t7
i=i+1
j=j+1
if(i < 3) goto B2
if( j < 3) goto B2
12/13/2012 5:14:42 PM
2.159
15. (b) (i) For languages that do not allow nested procedure declarations,
allocation of storage for variables and access to those variables
is simple
1. Global variables are allocated static storage. The locations
of these variables remain xed and are known at compile
time. So to access any variable that is not local to the currently executing procedure, we simply use the statically
determined address.
2. Any other name must be local to the activation at the top of
the stack. We may access these variables through the top-sp
pointer of the stack.
An important benet of static allocation for global is that
declared procedures may be passed as parameters or returned
as results, with no substantial change in the data-access strategy. With the C static-scoping rule, and without nested procedures, any name nonlocal to one procedure is nonlocal to all
procedures, regardless of how they are activated. Similarly, if a
procedure is returned as a result, then any nonlocal name refers
to the storage statically allocated for it.
The scope of a declaration statement in a block structured
long is given by the most closely nested rule.
1. The scope of a declaration in a block B includes B
2. If the name X is not declared in a block B, then an occurrence of X in B is in the scope of a declaration of X in an
enclosing block B1 such that
a. B1 is a declaration of X and
b. B1 is more closely nested around B than any other block
with a declaration of X.
12/13/2012 5:14:42 PM

NOV/DEC 2009
Sixth Semester

Time: Three hours
Maximum: 100 Marks

1. Dene Compiler?
2. What are the issues of lexical analyzer?
3. Dene CFG?
4. What is an ambiguous grammer? Give an example?
5. Dene Back patching?
6. Write down equation for two dimensional array?
7. What is a basic block?
8. How would you represent following equation using the DAG?
A:=b*c+b*-c
9. What is meant by code optimization?
10. Give the techniques used in loop optimization?
06_Nov-Dec 2009.indd 160
12/13/2012 5:40:25 PM
2.161
11. (a) (i) What are the various phases of compiler? Explain each phase in
detail. Write down the output of each phase for the expression
a = b + c 60
(ii) Briey explain compiler construction tools?
Or
(b) Prove that the following two regular expressions are equivalent
by showing that the minimum state DFAs are same. (i) (a/b)*
(ii) (a*/b*)*
12. (a) (i) Write down the necessary algorithm for nding FIRST and
FOLLOW
(ii) Give the algorithm for constructing SLR parsing table.
Or
(b) Show that the following grammer is LARL but not in SLR.
S-> L = R|R, L -> *R | id, R -> L
13. (a) what is three addresses code? What are its types? How is it
implemented?
Or
(b) How would you generate the intermediate code for the ow of control statements? Explain with Examples?
14. (a) Discuss the runtime storage management of a code generator?
Or
(b)
(i) Generate code for the following statements for target machine
(1) x = x + 1
(2) x = a + b + c
(3) x = a1/(bc)d*(e + f)
(ii) Explain the transformation of basic blocks?
12/13/2012 5:14:42 PM
2.162
15. (a) Explain data ow analysis concept with suitable example?

Or
(b) For the code
Sum = 0
Do 10 i = 1,n
10 sum = sum + a(i)*a(i) write the following:
(a)
(b)
(c)
(d)
(e)
three address code

control ow graph
Local common subexpression elimination
Invariant code motion
Reduction in strength
12/13/2012 5:14:42 PM
Solutions
PART A
1. Compiler is a program that read a program written in one language (high
level language) and translates it into an equivalent program in another
language.(Machine language)
Target program
Source program
Compiler
Error message
2. The issues in the design of Lexical analysis are

a. Simple Design
b. Compiler Efciency is improved
c. Compiler portability is enhanced
3. Many Programming language constructs have an inherently recursive
structure that can be dened by Context Free Grammar.
Context Free Grammar is a 4 tuple (V,T,P,S)
a. Non Terminals (V)
b. Terminals (T)
c. Productions (P)
d. Start Symbol (S)
4. A grammar that produces more than one parse tree for some sentence is
said to be ambiguous grammar.
The Grammar E->E+E/E*E/id is an ambiguous grammar.
5. Backpatching is the activity of lling up of unspecied information of
labels using appropriate semantic actions during the code generation
process. In the semantic actions the functions used are mklist(i), merge_
list(p1, p2) and backpatch(p, i).
6. The location of A[i1, i2] is
baseA+ ((i1low1)*n2 + i2low2)*width
baseA is the location of the array A.
low1 is the index of the rst row
12/13/2012 5:14:42 PM
2.164
low2 is the index of the rst column

n2 is the number of elements in each row
width is the width of each array element
Again, this formula can be re-written as
((i1*n2) + i2)*width + (baseA((low1*n1) + low2)*width)
7. A basic block is a sequence of consecutive statements in which the ow
of control enters at the beginning and leaves at the end without halt or
possibility of branching except at the end.
=
8.
uminus
9. The code produced by straight forward compiling algorithms can often

be made to run faster or take less space or both. This improvement is
achieved by program transformations that are traditionally called code
optimization.
10. a. Code motion
b. Induction variable elimination
c. Reduction in strength.
PART B
11. (a) (i) Compiler is a program that reads a program written in one language and translates it into an equivalent program in another
language. Compiler operates in phases, each of which transforms the source program from one representation to another.
A typical decomposition of a compiler is shown as
12/13/2012 5:14:43 PM
2.165
Source program
Lexical analyzer
Syntax analyzer
Semantic analyzer
Symbol table
management
Error handler
Intermediate code
generator
Code optimizer
Code generator
Target program
(1) Lexical Analysis

Lexical Analysis is also called as scanning. In this phase,
the source code is scanned and the source program is broken up into group of strings called token. A token is a
sequence of characters having a meaning. For example the
assignment statement a = b + c60 would be grouped into
the following tokens.
a. The identier a
b. The assignment symbol :=
c. The identier b
d. The plus sign +
e. The identier c
f. The minus sign
g. The constant 60
The blank characters used in the programming are eliminated during this phase.
(2) Syntax Analysis
The Syntax Analysis is also called Parsing. It involves
grouping the tokens of the source program into grammatical
phrases. These grammatical phrases of the source program
12/13/2012 5:14:43 PM
2.166
are represented by a parse tree. The parse tree generated for

the expression a: = b + c60 is
=
b
c
60
(3) Semantic Analysis

The Semantic analysis phase checks the source program for
semantic errors and gathers type information for the subsequent code generation phase. (i.e) the semantic analysis
determines the meaning of the source string. It uses the hierarchical structure determined by the syntax analysis phase
to identify the operands of expressions and statements.
An important component of semantic analysis is type
checking. Here the compiler checks that each operator
has operands that are permitted by the source language
specication.
(4) Intermediate Code generation
The intermediate Code is a code which is easy to generate
and should be easy to be translated to target program. This
code can have variety of forms, such as three address code,
quadruple, triple, postx notation. The three address code
for the expression a: = b + c60 is
t1 = inttoreal(60)
t2 = ct1
t3 = b + t2
a = t3
This intermediate for has several properties
Each three address instruction has atmost one operator in
addition to the assignment.
Compiler must generate a temporary name to hold the
value computed by each instruction.
Some three address instructions may have fewer than
three operands.
(5) Code Optimisation
Code optimization attempts to improve the intermediate
code, so that a faster running machine code will result.
Code Optimization improves the running time of the target
program without slow down the computation too much.
12/13/2012 5:14:43 PM
2.167
(6) Code Generation

Final phase of the compiler is Code generation. It is a
machine dependent phase of compiler. The code generated
may be relocatable machine code or assembly code. The
intermediate instructions are translated into a sequence of
machine instruction that perform the same task. Registers
are assigned to variable in this phase. Thus the translation
of the above code becomes
MOV C,R2
SUB #60.0,R2
MOV b,R1
ADD R2, R1
MOV R1,a
Symbol Table Management
An important function of a compiler is to record te identiers
used in the source program and collect information about attributes of each identier. These attributes may provide information about the storage allocated for an identier, its type, its
scope, and in the case of procedure names such as the number
and type of its arguments, the method of passing each argument, type returned etc.
A symbol table is a data structure containing a record for
each identier with elds for the attributes of the identier.
When an identier in source program is detected by the lexical analyzer, the identier is entered into the symbol table. The
remaining phases enter the information above identiers into
the symbol table and then use these information. The code generator enters and uses detailed information about the storage,
assigned to identiers.
Error Detection and Handling
Each phase can encounter errors. After detecting an error, a
phase must deal with that error, so that compilation can proceed, allowing further errors in the source program to be
detected.
The syntax and semantic analysis phase usually handle large
number of errors detectable by the compiler. The lexical phase
can detect errors, where the characters remaining in the input do
not form any token of the language. During semantic analysis
type mismatch kind of error is usually detected.
12/13/2012 5:14:43 PM
2.168
11. (a) (ii) Writing a compiler is difcult and time consuming task. There
are some specialized tools that can be used in the implementation of various phases of compiler. These tools are often
referred as compiler compilers, compiler generators or translator writing systems. Some of the useful compiler construction
tools are
a. Parser Generator
b. Scanner Generator
c. Syntax Directed Translation Engine
d. Automatic Code Generator
e. Data Flow Engines
a. Parser Generator
These produce syntax analyzers. Here the input is given in the
form of context free grammars. Many parser generators utilize
powerful parsing algorithms that are too complex to be carried
out by hand. UNIX has a parser generator tool called YACC.
b. Scanner Generator
These automatically generate lexical analyzers, normally form
a specication based on regular expressions. The basic organization of the resulting lexical analyzer is in effect a nite
automation.
c. Syntax Directed Translation Engines
Using this tool, the intermediate code is generated by scanning
completely the parse tree. The translation is done for each mode
of the tree and each translation is dened in terms of translations at its neighbor nodes in the tree.
d. Automatic Code Generator
This tool takes a collection of rules that dene the translation of
each operation of the intermediate languages into the machine
language for the target machine. Template matching technique
is used. The intermediate code statements are replaced by templates that represent sequences of machine instructions.
e. Data ow Engines
Data ow analysis is required to perform good code optimization. Data ow analysis involves gathering of information
about how values are transmitted from one part of a program
to each other part.
12/13/2012 5:14:43 PM
11. (b)
2.169
The minimum state DFA of (a/b)* is

a, b
0
The minimum state DFA of (a*/b*)*

a, b
0
12. (a) (i) First(a)

First(a) is the set of terminals that begin strings derived from a,
which can include epsilon.
First(X) starts with the empty set.
if X is a terminal, First(X) is {X}.
if X -> epsilon is a production, add epsilon to First(X).
if X is a non-terminal and X -> Y1 Y2 ... Yk is a production,
add First(Y1) to First(X).
for (i = 1; if Yi can derive epsilon; i ++)
add First(Yi+1) to First(X)
Follow(A)
Follow(A) for nonterminal A is the set of terminals that can
appear immediately to the right of A in some sentential form S ->
aAxB... To compute Follow, apply these rules to all nonterminals
in the grammar:
Add $ to Follow(S)
if A -> aBb then add First(b) - epsilon to Follow(B)
if A -> aB or A -> aBb where epsilon is in First(b), then add
Follow(A) to Follow(B).
12. (a) (ii) The LR parsing algorithm is given below
ip = rst symbol of input
repeat {
s = state on top of parse stack
12/13/2012 5:14:43 PM
2.170
a = *ip
case action[s,a] of {
SHIFT s: { push(a); push(s) }
REDUCE A->beta: {
pop 2*|beta| symbols; s = new state on top
push A
push goto(s, A)
}
ACCEPT: return 0 /* success */
ERROR: { error(syntax error, s, a); halt }
}
}
Constructing an SLR Parsing Table
Given a grammar G, construct the augmented grammar by
adding the production S' -> S. Construct C = {I0, I1, In}, the
set of sets of LR(0) items for G'.
State I is constructed from Ii, with parsing action determined
as follows:
Step 1: If [A -> .aB] is in Ii, where a a terminal; goto(Ii,a) =
Ij : then set action[i,a] = shift j
Step 2: [A -> .] is in Ii, then set action[i,a] to reduce A -> x
for all a in FOLLOW(A), where A != S'
Step 3: [S' -> S] is in Ii : set action[i,$] to accept
goto transitions constructed as follows:
for all non-terminals: if goto(Ii, A) = Ij, then goto[i,A] = j
All entries not dened by (3) & (4) are made error. If there
are any multiply dened entries, grammar is not SLR.
Initial state S0 of parser: that constructed from I0 or [S -> S]
12. (b) S-> L = R|R,
L ->*R | id, R -> L
Augmented grammar is
S| S
S L =R
SR
L *R
L id
RL
12/13/2012 5:14:44 PM
2.171
Canonical collection of LR(1) items are

I0:
S| .S, $
S L =R,$
S .R,$
L .*R,=
L .id,=
R .L,$
goto(0, S)
I 1:
S| S., $
I 2:
goto(0, L)
S L. =R,$
I 3:
goto(0, R)
S R.,$
I 4:
goto(0, *)
L *.R,=
R .L,=
L .*R,=
L .id,=
I 5:
goto(0, id)
L id.,=
I 6:
goto(2, =)
S L =.R,$
R .L,$
L .*R,$
L .id,=$
goto(4, R)
I 7:
L *R.,=
I 8:
goto(4, L)
R L.,=
I 9:
goto(6, R)
S L =R.,$
I10:
goto(6, L)
R L.,$
I11:
goto(6, *)
L *.R,$
R .L,$
L .*R,$
L .id,=$
I12:
goto(6, id)
L id.,=$
12/13/2012 5:14:44 PM
2.172
I13:
goto(11,R)
L*R.,$
LALR Table Construction

I4 & I11 are similar hence we can combine them
I411 or I4
:
L*.R,=/$
R.L, =/$
L.*R,=/$
L.id,=/$
I5 & I12 are similar
I512 or I5
:
Lid.,=/$
I7 & I3 are similar
I73 or I7
:
L*R.,=/$
I8 & I10 are similar
I810 or I8
:
RL.,=/$
Now the LALR parsing Table after grouping the states is
action
States
id
S4
S5
1
2
S6
10
13
R5
R2
S4
S5
S11
S12
R4
R4
R5
R5
R3
R3
9
10
$
Accept
3
5
goto
R5
R1
11
R1
S11
S12
12
R4
13
R3
12/13/2012 5:14:44 PM
13. (a)
2.173
Three address statements is an abstract form of intermediate code.

In a compiler, these statements can be implemented as records with
elds for the operator and the operands. Three address codes are
represented as Quadruples, Triples and Indirect Triples.
Quadruples
A quadruple is a structure with four elds namely op, arg1, arg2 &
result. The op eld contains an internal code for the operator. Three
address statements x = y op z is represented by placing y in arg1, z
in arg2 and x in result. Statements with unary operations like x = y
or x = y do not use arg2.
For example consider the statement a:=b*c+b*c
The three address code is
t1 = uminus(c)
t2 = b*t1
t3 = uminus(c)
t4 = b*t3
t5 = t2 + t4
a = t5
The quadruple representation is
S.No
Op
Arg1
Arg2
Result
(0)
Uminus
(1)
(2)
Uminus
(3)
t3
t4
(4)
t2
t4
t5
(5)
;=
t5
t1
t1
t2
t3
Triples
To avoid entering temporary name into the symbol table, we may
refer to a temporary value by the position of the statement that computes it. In this case, three address statements can be represented
by records with only three elds namely op, arg1, arg2. The elds
areg1 and arg2 are either pointers to the symbol table or pointers
into the triple structure. It refers to the symbol table for user dened
names or constant and to the triple structure for temporary value.
12/13/2012 5:14:44 PM
2.174
The triple *ab represents the inx operation a*b.

The sequence of triples produced for the statement a: = b*c + b*c
is given as
S.No
Op
Arg1
Arg2
(0)
Uminus
(1)
(2)
Uminus
(3)
(2)
(4)
(1)
(3)
(5)
Assign
(4)
(0)
Indirect Triples
It has been considered as that of listing pointers to triples, rather
than listing the triples themselves. The above three address code is
represented as
S.No
Statement
(0)
(14)
(1)
(15)
(2)
(16)
(3)
(17)
(4)
(18)
(5)
(19)
S.No
Op
Arg1
(14)
Uminus
(15)
(16)
Uminus
(17)
(2)
(18)
(1)
(3)
(19)
Assign
(4)
Arg2
(0)
12/13/2012 5:14:44 PM
13. (b)
When we translate the Boolean expression E into three address

code in context of if then else statement, we need to generate two
labels to enable the control to ow to the two parts of the statement,
then part and else part and while do statements. The grammar is
S -> If E then S1
| if E then S1 else S2
| while E do S1
S.No
1.
2.
3.
14. (a)
2.175
Production
Semantic Rules
S -> If E then S1
{E.true := newlabel;
E.false := S.next;
S1.next := S.next;
S.Code := E.Code || gen(E.true : ) || S1.Code}
S -> if E then S1
else S2
{E.true := newlabel;
E.false := newlable;
S1.next := S.next;
S2.next := S.next;
S.Code := E.Code || gen(E.true : ) || S1.Code
gen(goto S.next) || gen(E.false : ) || S2.Code
}
S -> while E do S1
{ S.begin ;= newlabel;
E.true := newlabel;
E.false := S.next;
S1.next := S.begin;
S.Code := gen(S.begin ; || E.code || gen(E.true
: ) || S1.code || gen(gotoS.begin) }
The semantics of procedure in a language determines how names

are bound to storage during execution. Information needed during
an execution of a procedure is kept in a block of storage called
activation record. Storage for names local to the procedure also
appears in the activation record.
Three standard storage allocation strategies were
(1) Static Allocation
(2) Stack Allocation
(3) Heap Allocation
(1) Static Allocation
In static allocation, the size of the data object is known at
compile time. The names of these objects are bound to storage at compile time only and such an allocation is done by
12/13/2012 5:14:44 PM
2.176
static allocation. The compiler can determine the amount of

storage required by each data object hence it becomes easy for
the compiler to nd the address of these data in the activation
record.
At compile time, the compiler can ll the addresses at which
the target code can nd the data it operates on. Some of the
limitations of Static Allocation are
a. Static Allocation can be done if the size of the data objet is
known at Compile time.
b. Recursive procedures are not supported.
c. Data structures cannot be created dynamically.
(2) Stack Allocation
It is based on the idea of control stack. Storage is organized as
a stack and activation records are pushed and popped as activation begins and ends respectively. Storage for the locals in each
call of a procedure is contained in the activation record for the
call. Thus locals are bound to fresh storage in each activation,
because a new activation record is pushed onto the stack when a
call is made. The values of locals are deleted, when the activation ends. The data structures can be created dynamically for
stack allocation.
The memory addressing can be done using pointers and index
registers. Hence this type of allocation is slower than the static
allocation.
(3) Heap Allocation
If the values of non-local variables must be retained even after
the activation record ends, then such a retaining is not possible
by stack allocation. For retaining of such local variables heap
allocation strategy is used. Heap allocation allocates the continuous block of memory when required for storage of activation records or other data objects. This allocated memory can be
deallocated when activation ends. This deallocated space can be
reused by a heap manager.
The efcient heap management can be done by
a. Creating a linked list for the free blocks and when any memory is deallocated, that block of memory is appended in the
linked list.
b. Allocate the most suitable block of memory from the linked
list.
12/13/2012 5:14:44 PM
2.177
14. (b) (i) (1) x = x+1

t1 = 1
t2 = x + t1
x = t2
(2) x =a + b + c
t1 = a
t2 = b + t1
t3 = c + t2
x = t3
(3) x=a1/(bc)d*(e+f)
t1 = b c
t2 = e + f
t3 = a1/t1
t4 = t3 * t2
x = t4
14. (b) (ii) A basic block computes a set of expressions. These expressions
are the values of the names live on exit from the block. Two
basic block are said to be equivalent if they compute the same
set of expressions. The number of transformations can be
applied to a basic block without changing the set of expressions
computed by the block. Many of these transformations are useful for improving the quality of code that will be ultimately
generated from a block.
Two types of transformations can be done on a Basic Block
a. Structure Preserving Transformations
b. Algebraic Transformations
a. Structure Preserving Transformations
Primary structure preserving transformations on basic
blocks are
Common Sub expression elimination
Dead Code Elimination
Renaming of temporary variables
Interchange of statements
Common Sub expression elimination
Consider the basic block
a: = b + c
b: = ad
c: = b + c
d: = ad
12/13/2012 5:14:44 PM
2.178
In the above code second and fourth compute the same

expression b + cd, so the basic block can be transformed
to
a: = b + c
b: = ad
c: = b + c
d: = b
Dead Code Elimination
If a value assigned to a variable is not used anywhere, then
that expression is called as dead code. That can be eliminated or removed.
For example, x = y + z is an expression in a basic block.
If xs value is nowhere used then it is said to be dead, so that
the expression can be removed from the block.
Renaming of Temporary Variables
Suppose we have a statement t: = b + c, where t is a temporary. If we change the statement to u: = b + c, where u is a new
temporary variable, change all uses of this instance of t to u.
Interchange of Statements
Suppose we have a basic block with two statements like t = b + c
and u = x + y, then we can interchange the two statements without neither affecting the value of the block iff neither x or y and
neither b or c is u.
b. Algebraic Transformations
Algebraic transformations can be used to change the set
of expressions computed by a basic block into an algebraically equivalent set. The useful ones are those that simplify
expressions or replace expensive operations by cheaper
ones.
For example
X = x + 0 or x: = x*1
Can be eliminated from a basic block without changing
the set of expressions it computes.
The exponentiation operator in the statement x: = y**
2 usually requires a function call to implement. Using an
12/13/2012 5:14:44 PM
2.179
algebraic transformation, this statement can be replaced by

the cheaper, but equivalent statement x: = y * y.
15. (a)
Data ow analysis is basically a process of computing the values

of a set of items of data ow information which are useful for the
purpose of optimization. It determines information regarding the
data ow in a program like how data items are assigned and referenced in a program, what are the values which are available when
program execution reaches a specic statement of the program. In
order to do code optimization and a good job of code generation, a
compiler needs to collect information about the program as a whole
and to distribute the information to each block in the ow graph.
Data ow information can be collected by setting up and solving
systems of equations that relate information at various points in a
program.
A typical equation has the form
Out [s] = gen [s] U(in [s] kill [s])
And can be read as the information at the end of the statement is
either generated within the statement, or enters at the beginning and
is not killed as control ows through the statement. Such equations
are called data ow equations. In general, equations are set up at the
level of basic block instead of statements.
How data ow equations are set up and solved depend on the
following factors
a. The notions of generating and killing depend on the desired
information,
b. In dataow along control paths, data ow analysis is affected by
the control construct in the program.
The execution of a program can be viewed as a series of transformations of the program state, which consists of the values of
all the variables in the program, including those associated with
stack frames below the top of the run-time stack. Each execution of
an intermediate-code statement transforms an input state to a new
output state. The input state is associated with the program point
before the statement and the output state is associated with the program point after the statement.
Let us now concentrate on the paths through a single ow graph
for a single procedure.
Within one basic block, the program point after a statement is
12/13/2012 5:14:44 PM
2.180
the same as the program point before the next statement.

If there is an edge from block B1 to block B2, then the program point after the last statement of B1 may be followed
immediately by the program point before the rst statement
of B2.
Let us dene an execution path (or just path) from point pl to
point p, to be a sequence of points p1, p2, . . . , Pn such that for
each i = 1,2, . . . , n1,either
(1) pi is the point immediately preceding a statement and pi+l
is the point immediately following that same statement, or
(2) pi is the end of some block and pi+l is the beginning of a
successor block.
In general, there is an innite number of possible execution
paths through a program, and there is no nite upper bound
on the length of an execution path. Program analyses summarize all the possible program states that can occur at a point in
the program with a nite set of facts. Different analyses may
choose to abstract out different information, and in general, no
analysis is necessarily a perfect representation of the state.
d1: i = m 1
d2: j: = n
d3: a: = u1
B1
d4: i: = i + 1
d5: j: = j 1
B2
B3
B4
d6: = a: = u2
B6
12/13/2012 5:14:44 PM
2.181
15. (b) (a) Three address Code

(1) Sum: =0
(2) i: =1
(3) t1 : = 4 * i
(4) t2: = addr(A)-4
(5) t3 : = t2[t1]
(6) t4: =addr(B)-4
(7) t5: = t4[t1]
(8) t6: =t3 * t5
(9) sum: =sum + t6
(10) i: =i+1
(11) if i<= n goto (3)
15. (b) (b) Control ow graph
Sum: = 0
i: = 1
t1: = 4 * i
t2: = addr(A)-4
t3: = t2[t1]
t4: = addr(B)-4
t5: = t4[t1]
t6: = t3 * t5
sum: = sum + t6
i: = i + 1
if i <= n goto B2
15. (b) (c) Common Subexpression Elimination

1. Sum:=0
1.
2. :=1
2.
3. if i>n goto 15
3.
4. t1= addr(a)-4
4.
5. t2=i*4
5.
6. t3= t1[t2]
6.
7. t4=addr(a)-4
7.
8. t5=i*4
8.
B1
B2
Sum:=0
i:=1
if i>n goto 15
t1= addr(a)-4
t2=i*4
t3= t1[t2]
t4=addr(a)-4
t5=i*4
12/13/2012 5:14:45 PM
2.182
9.
10.
11.
12.
13.
14.
15.
t6=t4[t5]
t7-t3*t6
t8=sum+t7
sum=t8
i=i+1
goto 3
...
9. t6=t4[t5]
10. t7=t3*t6
10a t7=t3*t3
11. sum=sum+t7
12. sum=t8
13. i = i + 1
14. goto 3
15. (b) (d) Invariant Code Motion
15. (b) (e) Strength Reduction

1. Sum:=0
2. i:=1
2a. t1=addr(a)-4
4. if i>n goto 15
5. t2=i*4
6. t3=t1[t2]
10a t7=t3*t3
11a sum-sum+t7
13. i=i+1
14. goto 3
15. ...
Sum: = 0
i: = 1
B1
t2: = addr(A)-4
t4: = addr(B)-4
B3
t1: = 4 * i
t3: = t2[t1]
t5: = t4[t1]
t6: = t3 * t5
sum: = sum + t6
i: = i + 1
if i <= n goto B2
B2
1. Sum:=0
2. i:=1
2a. t1=addr(a)-4
2b t2=i*4
3. if i>n goto 15
5. t2=i*4
6. t3=t1[t2]
10a. t7=t3*t3
11a. sum=sum+t7
11b. t2=t2+4
13. i=i+1
14. goto 3
15. ...
12/13/2012 5:14:45 PM

MAY/JUNE 2009
Principles of Complier Design

Time: Three hours
Maximum: 100 Marks

1. What are the issues to be considered in the design of lexical analyzer?
2. Derive the string and construct a syntax tree for the input string ceaedbe
using the grammer S->SaA|A, A->AbB|B, B->cSd|e.
3. Dene concrete and abstract syntax with example?
4. List the factors to be considered for top-down parsing?
5. Why is it necessary to generate intermediate code instead of generate
target program itself ?
6. Dene backpatching?
7. List the issues in code generation?
8. Write the steps for constructing leaders in basic blocks?
9. What are the issues in static allocation?
10. What is meant by copy-restore?
12/13/2012 5:14:45 PM
2.184
11. (a)
(i) Explain the need for dividing the compilation process into
various phases and explain its functions.
(ii) Explain how abstract stack machine can be used as
translators?
Or
(b) What is syntax directed translation? How it is used for translation

of expressions?
12. (a) Given the following grammer S-> AS|b, A->SA|a
Construct a SLR parsing table for the string baab.
Or
(b) Consider the grammer E->E+T/T,T->T*F/F,F->(E)\id. Using predictive parsing the string id + id*id.
13. (a) Explain in detail how three address codes are generated and
implemented?
Or
(b) Explain the role of declaration statements in intermediate code
generation?
14. (a) Design a simple code generator and explain with example?
Or
(b) Write short notes on: Peep hole optimization and Issues in Code
generation
15. (a) Explain with an example how basic blocks are optimized?
Or
(b) Explain the storage allocation strategies used in run time
environments?
12/13/2012 5:14:45 PM
Solutions
PART A
1. The issues in the design of Lexical analysis are
a. Simple Design
b. Compiler Efciency is improved
c. Compiler portability is enhanced.
2.
A
A
B
e
3. The concrete syntax of a programming language is dened by a context

free grammar. It consists of a set of rules that dene the way program
look like to the programmer.
The abstract syntax of an implementation is the set of trees used to
represent programs in the implementation. The abstract syntax denes
the way the programs look like to the evaluator/Compiler.
4. The factors to be considered are
a. Elimination of left recursion
b. Elimination of left factoring.
5. (i) There are certain features of source language which may make it
impossible to generate, target code in one pass of the compiler for
12/13/2012 5:14:45 PM
2.186
example a language may allow the declaration of data items anywhere in the program. It may not be necessary for the declaration to
precede the rst use of the data item.
(ii) Sufcient core memory may not be available to accommodate a
single pass compiler.
(iii) A multipass structure may be required to satisfy the primary aims of
the compiler to generate a highly efcient target code or to occupy
minimum possible storage space.
6. Refer Nov/Dec 2009 - Q. No. 5.
7. The issues are
a. Input to the code generator
b. Target program
c. Memory management
d. Instruction selection
e. Register allocation
f. Choice of evaluation order
8. The rst statement in a basic block is a leader.
Any statement which is the target of a conditional or unconditional
goto is a leader.
Any statement which immediately follows a conditional goto is a
leader.
9. The size of the data object and constructs on its position in memory
must be known at compile time.
Recursive procedures are restricted.
Data structure cannot be created dynamically.
10. A hybrid between call by value and call by reference is call restore linkage. It is also called as Copy in Copy out or Value result.
PART B
11. (a) (i) Refer Nov/Dec 2009 - 11(a)(i).
11. (a) (ii) The abstract machine code for an expression simulates a stack
evaluation of the postx representation for the expression.
Expression evaluation proceeds by processing the postx representation from left to right.
12/13/2012 5:14:46 PM
Principles of Complier Design (May/June 2009)
2.187
Evaluation
(1) Pushing each operand onto the stack when encountered.
(2) Evaluating k-array operator by using the value located k-1
positions below the top of the stack as the left most operand an so on, till the value on top of the stack is used as the
right most operand.
(3) After the evaluation, all k operands are popped from the
stack, and the result is pushed onto the stack.
Example
Stmt -> ID = expr{ stmt.t = expr.t || istore a }
Applied to a = 3 bc
bipush 3
iload b
imul
iload c
isub
istore a
Java Virtual Machine
Similar to the abstract stack machine, the java virtual machine
is an abstract processor architecture that denes the behavior
of Java Byte code programs. The stack in JVM is referred to
as operand stack or value stack. Operands are fetched from the
stack and the result is pushed back on to the stack.
11. (b)
Associating the attributes with the grammar symbols is called translation. When we associate semantic rules with productions, we use
two notations.
(1) Syntax Directed denitions
(2) Translation Schemes
(1) Syntax Directed Denitions
It gives high level specications for translations. It hides many
implementation details such as order of evaluation of semantic
actions. We associate a production rule with a set of semantic
actions, and we do not say when they will be evaluated.
(2) Translation Schemes
It indicates the order of evaluation of semantic actions associated with a production rule. In other words, translation schemes
give a little bit information about implementation details.
12/13/2012 5:14:46 PM
2.188
Attributes
(i) Place: It refers to the location to store the value for that
symbol.
(ii) Code: It refers to the expression or combination of expressions
in the form of three address code.
(iii) Value: It refers to the value of a symbol.
(iv) New Temp: It returns a sequence of distinct names t1, t2, in
response to successive calls.
(v) Gen: It is used for evaluating expression.
The syntax directed denition for associating a type to an
Expression is
Production
E-> literal
E->num
E->id
E->E1 mod E2
E-> E1 [E2]
E->E1
12. (a)
Semantic Rule
E.type := char
E.type : = int
E.type := lookup(id.entry)
E.type := if E1.type = int and E2.type := int
Then int else type error
E.type:= if E2.type = int and E1.type = array (s,t)
Then t
Else type error
E.type := if E1.type = pointer (t) then t
Else type error
The Given Grammar is S-> AS

S-> b,
A->SA
A -> a
FIRST (S) = FIRST (A) = {a, b}
FOLLOW (S) = { $, FIRST(A)}
= {$, a, b}
FOLLOW (A) = {FIRST(S)}
= {a, b}
Step 1: The Augmented Grammar is
S -> S
S-> AS
S-> b,
A->SA
12/13/2012 5:14:46 PM
2.189
A -> a
Step 2: Canonical Collection of LR(0) items
I0:
S -> .S
S->.AS
S->.b,
A->.SA
A ->.a
I 1:
Goto(I0, S) => S -> S.

A->S.A
A->.AS
A->.SA
S->.AS
I2:
Goto(I0, A) => S->A.S

S->.AS
S->.b
A->.SA
A->.a
I 3:
Goto(I0, a) => A->a.
I 4:
Goto(I0, b) => S->b.
I 5:
Goto(I1, S) => A->S.A

A->.SA
A->.a
S->.AS
S->.b
I 6:
Goto(I1, A) => A->SA.

S->A.S
S->.AS
S->.b
A->.SA
A->.a
I 7:
Goto((I2, S) => S->AS.

A->S.A
A->SA
A->.a
S->.AS
S->.b.
12/13/2012 5:14:46 PM
2.190
Parsing Table
States
Action
a
S3
S4
S3
S4
S3
S4
r4
r4
r2
r2
S3
S4
S3, r3
S4, r3
S3, r1
S4, r1
GOTO
$
Accept
r2
r1
Some of the entries in the parsing table have multiple entries,

the given grammar is not SLR grammar. Since it is not in SLR, the
given string is not possible to parse.
12. (b)
In this grammar E-> start Symbol

After left recursion elimination the grammar becomes
E->TE
E-> + TE /
T->FT
T->*FT/
F->(E )
F->id
Step 1:
Computaion of FIRST
FIRST (E ) = FIRST (T) = FIRST(F) = {(, id}
FIRST(E) = {+, }
FIRST(T) = { *, }
Computation of FOLLOW
FOLLOW(E ) = {$,)}
FOLLOW(E) = FOLLOW (E ) = {), $}
FOLLOW(T) = {FIRST(E), FOLLOW(E)} = {+.),$}
FOLLOW(T) = FOLLOW(T) = {+, $}
FOLLOW(F) = FIRST(T), FOLLOW(T) = {*,), $}
12/13/2012 5:14:46 PM
2.191
Step 2: Construction of Parsing Table

id
E
E->
E->
T->
T->
E->TE
E->+TE
T->FT
T->FT
T->
T
F
E->TE
E
T
T->*FT
F->id
F->(E)
This grammar is a LL(1) grammar.

Step 3: Parsing the String
id + id*id
Stack
Input String
Action
$E
$ET
$ETF
$ETid
$ET
$E
$ET
$ET
$ETF
$ETid
$ET
$ETF*
$ETid
$ET
$E
$
id + id*id$
id + id*id$
id + id*id$
id + id*id$
+ id*id$
+ id*id$
id*id$
id*id$
id*id$
id*id$
*id$
*id$
id$
$
$
$
Push E->TE
Push T->FT
Push F->id
Pop id
Push T->
Push E->+TE
Pop +
Push T->FT
Push F->id
Pop id
Push T->*FT
Pop *
Push F->id
Push T->
Push E->
Success
13. (a)
Refer Nov/Dec 2009 - 13(a).
13. (b)
As the sequence of declarations in a procedure or block is examined, we can layout storage for names local to the procedure. For
each local name, we create a symbol table entry information like
the type and the relative address of the storage for the name. the
relative address consists of an offset from the base of the static data
area or the eld for local data in activation record.
The syntax of language such as C, Pascal and FORTRAN allows
all the declarations in a single procedure to be processed as a group.
12/13/2012 5:14:46 PM
2.192
In this case a global variable say offset, can keep track of the next
available relative address.
For example, in the translation scheme, non-terminal P generates
a sequence of declarations of the form id:T. Before the rst declaration is considered offset is set to 0.
The procedure enter(name, type, offset) creates a symbol table
entry for name gives it type and relative address offset in its data
area. We use synthesized attribute type and width for non-Terminal
T to indicate the type and width or number of memory units taken
by objects of that type. Synthesized translation is one where the
translation depends on the translation of the children.
The types and relative address of declared names is given as
SI.No Production
1.
PD
2.
DD;D
D id : T
3.
T integer
4.
5.
6.
Semantic rule
{Offset: = 0}
{enter (id. name, T. type, offset);
Offset: = offset + T. width}
{T. type: = Integer;
T. Width: = 4}
T real
{T.type: = real;
T.width: = 8}
T array [num] of T1 {T.type: = array (num.val,T1 type);
T.width: = num.val XT1.width }
{T.type : = pointer (T1. type);
T T1
T. Width: = 4}
The syntax directed denition for associating a type to an Expression is

Production
Semantic rule
E literal
E num
E id
E E1 mod E2
E.type:= char
E.type:= int
E.type:= lookup (id.entry)
E.type:= if E1.type = int and E2.type = int
Then int
else type error
E.type:= if E2.type = int and E1.type = array
(s, t)
Then t
else type_error
E.type:= if E1.type = pointer (t) then t
Else type_error
E E1[E2]
E E1
12/13/2012 5:14:46 PM
14. (a)
2.193
The code generation algorithms takes as input a sequence of three

address statements constituting a basic block. For each three
address statements of the form x = y op z, we perform the following actions.
(1) Invoke a function getreg() to determine the location L, where
the result of the computation y op z should be stored. L will
usually be a register, but it could also be a memory location.
(2) Consult the address descriptor for y to determine y, the current
location of y.
(3) Generate the instruction op z, L where z is a current location
of z.
(4) If the current values of y and z have no next uses, are not live
descriptor to indicate that, after the execution of x;= y op z,
those register no longer will contain y and or z, respectively.
If the current three address statement as a unary operator, the
steps are analogous to the above. A special case is a three address
statement x = y. if y is in a register, simply change the register
and address descriptor to record that value of x is now found only
in the register holding the value of y. if y has no next use, and is
not use on exit from the block, the register no longer holds the
value of y.
If y is in memory, we use getreg() to nd a register in which to
load y and make that register the location of x.
Once we have processed all three address statements in the basic
block, we store, by MOV instructions, those names that are liveon
exit and not in their memory locations.
The function getreg() return the location L to hold the value of x
for the assignment x = y op z. a great deal of effort can be expended in
implementing this function to produce a perspicacious choice for L.
(1) If the name y is in a register that holds the value of no other
names, and y is not live and has no next use after execution of
x= y op z, then return the register of y for L. update the address
descriptor of y to indicate that y is no longer in L.
(2) Return an empty register for L, failing the above.
(3) Failing (2), if x has a next use in the block, or op is an operator,
such as indexing, that requires a register, nd and occupied
register R.
(4) If x is not used in the block or no suitable occupied register can
be found, select the memory location of x as L.
12/13/2012 5:14:46 PM
2.194
A more sophisticated getreg function would also consider the

subsequent uses of x and the commutavity of the operator op in
determining the register to hold the value of x.
14. (b)
1. Peephole Optimization
Peephole optimization is a technique for improving the quality
of the target code, the technique can also be applied directly after
intermediate code generation to improve the intermediate representation. It takes limited range of code and replaces them by shorter
sequence of codes. It is a local code improving.
Characteristics of Peephole optimization are
(1)
(2)
(3)
(4)
(5)
(6)
Eliminate Redundant Load and Stores

Eliminate Unreachable Code
Flow of Control optimization
Algebraic Simplications
Reduction in Strength
Eliminate Redundant Load and Stores

Let us consider the instruction sequence
(1) MOV R0, a
(2) MOV a, R0
From the above sequence, it is clear that redundant load and store
transfer of data takes place and such codes can be eliminated. We
can delete instruction (2), because whenever (2) is executed, (1)
will ensure that the value of a is already in Register R0l .
Eliminate Unreachable Code
Another optimization is the removal of unreachable code. An unlabeled instruction immediately following an unconditional jump
may be removed. This operation can be repeated to eliminate a
sequence of instructions.
For example
K = 0;
If K = 0 goto L1;
K = k+1;
L1 :
12/13/2012 5:14:46 PM
2.195
In the above example, statement K = K+1 will never get executed.

Such code can be eliminated.
Flow of Control Optimizations.
The intermediate code generation algorithms frequently produce
jumps to jumps, jumps to conditional jumps or conditional jumps
to jumps. These unnecessary jumps can be eliminated in either the
intermediate code to the target code. For example
We can replace the jump sequence
Goto L1;
L1: goto L2;

By the sequence
Goto L2;
L1 :goto L2;
Algebraic Simplications
There is no end to the amount of algebraic simplications that
can be attempted through peephole optimization. For example
statements such as
X=X+0
or
X=X*1
Are often produced by intermediate code generation algorithms
and they can be eliminated easily through peephole optimization.
Use of Machine Idioms
The target machine may have hardware instructions to implement
certain operations efciently. Detecting situations that permit the
use of these instructions can reduce execution time signicantly. For
example some machines, have auto decrement addressing modes.
The use of those modes greatly improves the quality of code.
Reduction In Strength
Reduction in strength replaces expensive operation by equivalent cheaper ones on the target machine. For example X2 is invariably cheaper to implement X * X than as a call to an exponentiation
routine.
2. Issues in Code generation
The issues in the design of code generator are
a. Input to the code generator
b. Target Program
12/13/2012 5:14:46 PM
2.196
a.
b.
c.
d.
e.
c. Memory management
d. Instruction selection
e. Register allocation
Input to the code generator
The input to the code generator consists of an intermediate
representation of the source program, together with information
in the symbol table that is used to determine the runtime addresses
of the data objects denoted by the names in the intermediate
representation. The intermediate code may be any form such as
three address code, quadruple, triples, postx notations, or it may
be represented using graphical representations such as Syntax
Trees or Directed Acyclic Graphs.
Target Program
The output of the code generator is a target program. The output
may take on a variety of forms.
(i) Absolute Machine language
(ii) Relocatable Machine Language
(iii) Assembly Language
Memory Management
Mapping names in the source program to addresses of data
objects in run time memory is done cooperatively by the front
end and the code generator. Symbol table entries were created
as the declarations in a procedure were examined. The type in
declaration determines the width. From the symbol table information, a relative address can be determined for the name in a
data area for the procedure. If machine is being generated, labels
in three address statements have to be converted to addresses of
instructions.
Instruction Selection
The uniformity and completeness of instruction set is an important factor for the code generator. The selection of instruction
depends upon the instruction set of target machine. The speed of
instruction and machine idioms are two important factors in the
selection of instruction. If we do not care about the efciency of
the target program, instruction selection is straight forward.
Register Allocation
Instructions involving register operands are usually shorter and
faster than those involving operands in memory. Hence efcient
utilization of registers is important in generating good code. The
use of registers is subdivided into two subproblems
12/13/2012 5:14:46 PM
2.197
(i) Register Allocation During Register Allocation, a set of

variables that will reside in registers is selected.
(ii) Register Assignment During Register Assignment, the
specic register that a variable will reside is picked.
Finding an optimal assignment of registers to variables is
difcult. Certain machine requires register pairs for some
operands and results.
f. Choice of evaluation order
The order in which computations are performed can affect the
efciency of target code. Some computation order require fewer
registers to hold intermediate results than others. Picking up the
best order is another difcult. Mostly we can avoid this problem
by referring the order in which the three address code is generated by semantic actions.
15. (a)
Refer Nov/Dec 2009 - 14(b)(ii).
15. (b)
Refer Nov/Dec 2009 - 14(a).
12/13/2012 5:14:46 PM

NOV/DEC 2008

Time: Three hours
Maximum: 100 Marks

1. What are the functions of preprocessors?
2. Dene a symbol table?
3. What is an ambiguous grammar?
4. What is a predictive parser?
5. What are the notations used to represent an intermediate languages?
6. Give the ways of representing three address statements?
7. What are the basic blocks and ow graphs?
8. What are the limitations of static allocation?
9. Dene Activation tree?
10. What is inline expansion?
11. (a)
(i) Explain in detail about the role of lexical analyzer with the
possible error recovery actions?
(ii) What is a compiler? Explain the various phases of compiler in
detail, with a neat sketch?
Or
(b)
(i) Give an minimized DFA for the following expression

(a/b)*abb
12/13/2012 10:12:31 PM
2.199
(ii) Draw the transition diagram for the unsigned numbers?

12. (a)
(i) Explain the role of parser in detail?

(ii) Construct the predictive parsing table for the grammar
E-> E + T| T, T -> T* F| F, F -> (E)|id
Or
(b)
13. (a)
(i) Give the LALR parsing table for the grammar

S-> L = R|R, L -> * R | id, R -> L
(ii) What are the reasons for using LR parsing technique?
(i) Explain about the different type of three address statements?
(ii) What are the methods of translating Boolean expression?
Or
(b)
14. (a)
(i) Write short notes on back patching?

(ii) Explain procedure calls with an example?
(i) Construct the DAG for the following basic block:
d: = b*c
e: = a + b
b: = b*c
a: = e d
(ii) Explain in detail about primary structure preserving transformations on basic blocks?
Or
(b)
(i) Describe in detail about a simple code generator with the

appropriate algorithm?
(ii) Explain in detail about run time storage management?
15. (a)
(i) Explain in detail about the principal sources of optimization?

(ii) Describe in detail about the optimization of the basic blocks
with example?
Or
(b)
(i) Describe in detail about the storage organization?

(ii) Explain in detail various methods of Passing parameters?
12/13/2012 10:12:31 PM
B.E./B.TECH. DEGREE EXAMINATION

APRIL/MAY 2007
Sixth Semester

Time: Three hours
Maximum: 100 Marks

1. Dene a preprocessor.
2. What are the issues in Lexical Analysis?
3. Eliminate the left recursion from the following grammar A Ac/Aad/
bd/c.
4. What are the disadvantages of operator precedence parsing?
5. Write the properties of intermediate language.
6. What is back patching?
7. What are the applications of DAG?
8. Give the primary structure preserving transformations on Basic Blocks.
9. What do you mean by code motion?
10. Draw the diagram of the general activation record and give the purpose
of any two elds.
11. (a) (i) Write about the phases of compiler and by assuming an input and
show the output of various phases.
09-April-May-2010.indd 200
12/13/2012 10:12:58 PM
2.201
(ii) Explain briey about computer construction tools.

Or
(b) (i) Construct the NFA from the (a/b)*a(a/b) using Thompsons construction algorithm.
(ii) Explain about Input buffering technique.
12. (a) (i) Construct predictive parsing table for the grammar.
S (L)/a;
L L,S/S.
(ii) What are the different strategies that a parser can employ to
recover from syntax errors?
Or
(b) (i) Construct the CLR parsing table from
S AA;
A Aa/b.
(ii) Write Operator-precedence parsing algorithm.
13. (a) (i) Write about implementation of three addressing statements.
(ii) Give the syntax-directed denition for ow of control
statements.
Or
(b) (i) How Back patching can be used to generate code for Boolean
expressions and ow of control statements.
(ii) Write short notes on procedure calls.
14. (a) (i) Write in detail about the issues in the design of a code
generator.
(ii) What are steps needed to compute the next use information?
Or
(b) (i) Discuss briey about DAG representation of basic blocks.
(ii) Explain the characteristics of peephole optimization.
15. (a) (i) Describe the principal sources of optimization.
(ii) Write about Data ow analysis of structural programs.
Or
(b) (i) What are the different storage allocation strategies? Explain.
(ii) Write short notes on parameter parsing.
09-April-May-2010.indd 201
12/13/2012 10:12:58 PM
12_Nov-Dec_2007.indd 246
12/13/2012 5:45:57 PM

Compiler QBank From CD

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compiler QBank From CD

Uploaded by

Copyright:

Available Formats

Anna University

Solved Question Papers

Copyright 2015 Pearson India Education Services Pvt. Ltd

02-Priniciples of complier Design-CSE.indb 1

B.E./B.TECH. DEGREE EXAMINATION,

(Common to Information Technology)

Maximum: 100 marks

PART A (10 2 = 20 marks)

02-Priniciples of complier Design-CSE.indb 3

B.E./B.Tech. Question Papers

02-Priniciples of complier Design-CSE.indb 4

Principles of Compiler Design (May/June 2012)

02-Priniciples of complier Design-CSE.indb 5

Deleting an extraneous character

02-Priniciples of complier Design-CSE.indb 6

Principles of Compiler Design (May/June 2012)

Following criteria should be adopted for register assignment for outer

02-Priniciples of complier Design-CSE.indb 7

B.E./B.Tech. Question Papers

02-Priniciples of complier Design-CSE.indb 8

Principles of Compiler Design (May/June 2012)

Intermediate code generation:

02-Priniciples of complier Design-CSE.indb 9

B.E./B.Tech. Question Papers

During the code optimization, the result of the program is not

02-Priniciples of complier Design-CSE.indb 10

Principles of Compiler Design (May/June 2012)

Intermediate Code Generator

Fig: Phases of compiler

02-Priniciples of complier Design-CSE.indb 11

B.E./B.Tech. Question Papers

02-Priniciples of complier Design-CSE.indb 12

Principles of Compiler Design (May/June 2012)

12. (a) (i) As the given grammar is left recursive because of

02-Priniciples of complier Design-CSE.indb 13

B.E./B.Tech. Question Papers

We can write LL,S/S

As we have constructed a predictive parsing table in the string

02-Priniciples of complier Design-CSE.indb 14

Principles of Compiler Design (May/June 2012)

(ii) Conicts in shift-reduce parsing:

02-Priniciples of complier Design-CSE.indb 15

B.E./B.Tech. Question Papers

02-Priniciples of complier Design-CSE.indb 16

Principles of Compiler Design (May/June 2012)

3. E E1 mod E2 {E.type : = if E1. type = integer and

02-Priniciples of complier Design-CSE.indb 17

B.E./B.Tech. Question Papers

(b) (ii) Storage Organization:

02-Priniciples of complier Design-CSE.indb 18

Principles of Compiler Design (May/June 2012)

2. Variable Length Name

02-Priniciples of complier Design-CSE.indb 19

B.E./B.Tech. Question Papers

02-Priniciples of complier Design-CSE.indb 20

Principles of Compiler Design (May/June 2012)

is the three-address sequence

02-Priniciples of complier Design-CSE.indb 21

B.E./B.Tech. Question Papers

and, or, and not if we represent the value of an expression by a

02-Priniciples of complier Design-CSE.indb 22

Principles of Compiler Design (May/June 2012)

Syntax-directed denition for ow-of-control statements

Control-Flow Translation of Boolean Expressions: