Professional Documents
Culture Documents
Compiler QBank From CD
Compiler QBank From CD
Chennai
FM.indd 1
Delhi
4/27/2014 6:01:29 PM
Semester-VI
Principles of
Compiler Design
12/13/2012 5:14:18 PM
The aim of this publication is to supply information taken from sources believed to be valid and
reliable. This is not an attempt to render any type of professional advice or analysis, nor is it to
be treated as such. While much care has been taken to ensure the veracity and currency of the
information presented within, neither the publisher nor its authors bear any responsibility for
any damage arising from inadvertent omissions, negligence or inaccuracies (typographical or
factual) that may have found their way into this book.
EEE_Sem-VI_Chennai_FM.indd iv
12/7/2012 6:40:43 PM
PART B (5 16 = 80 marks)
11. (a) (i) What are the various of the compiler? Explain each phase in
detail
(10)
(ii) Briey explain the compiler construction tools.
(6)
12/13/2012 5:14:18 PM
2.4
Or
(b) (i) What are the issues in Lexical analysis?
(ii) Elaborate in detail the recognition of tokens.
(4)
(12)
12. (a) (i) Construct the predictive parser for the following grammar:
S (L)/a
L L, S/S
(10)
(ii) Describe the conicts that may occur during shift reduce parsing
(6)
Or
(b) (i) Explain the detail about the specication of a simple type
checker
(10)
(ii) How to subdivide a runtime memory in to code and data areas.
Explain.
(6)
13. (a) (i) Describe the various types of three address statements.
(8)
(ii) How names can be looked up in the symbol table? Discuss. (8)
Or
(b) (i) Discuss the different methods for translating Boolean expressions in detail.
(12)
(ii) Explain the following grammar for a simple procedure call statement. S->call id(Elist)
(4)
14. (a) (i) Explain in detail about the various issues in design of code
generator
(10)
(ii) Write an algorithm to partition a sequence of three address statements into basic blocks.
(6)
Or
(b) (i) Explain the code generation algorithm in detail.
(ii) Construct the DAG for the following basic block
d: = b*c
e: a+b
(8)
(8)
12/13/2012 5:14:18 PM
2.5
b: = b*c
a:= e-d
14. (a) (i) Explain the principle sources of optimization in detail.
(ii) Discuss the various Peephole optimization in detail.
(8)
(8)
Or
(b) (i) How to trace the data ow analysis of structured program?
Discuss.
(6)
(ii) Explain the common sub expression elimination, copy propagation and transformations for moving loop invariant computations in detail.
(10)
12/13/2012 5:14:18 PM
Solutions
PART A
1. Cousins of compiler means the context in which the compiler typically
operates. Such contexts are basically the programs such as Preprocessor,
assemblers, loader and link editors.
2. a.
b.
c.
d.
3. A grammar that produces more than one parse for some sentence is said to
be ambiguous grammar.
Example Given grammar G: E E+E | E*E | (E) | - E | id
The sentence id+id*id has the following two distinct leftmost derivations:
E E+ E
E E* E
E id + E
EE+E*E
E id + E * E
E id + E * E
E id + id * E
E id + id * E
E id + id * id
E id + id * id
4. If a heap variable is destroyed, any remaining pointer variable or object
reference that still refers to it is said to contain a dangling reference. Unlike lower level languages such as C, dereferencing a dangling reference
will not crash or corrupt your IDL session. It will, however, fail with an
error message.
For example:
; Create a new heap variable.
A = PTR_NEW(23)
; Print A and the value of the heap variable A points to.
PRINT, A, *A
IDL prints:
<PtrHeapVar13> 23
5. In the quadruple representation using temporary names the entries in
the symbol table against those temporaries can be obtained
The advantage with quadruple representation is that one can quickly
access the value of temporary variables using symbol table . use of temporaries introduces the level of indirection for the use of symbol table
in quadruple representation
12/13/2012 5:14:18 PM
2.7
Whereas ,in triple representation the pointers are used , by using pointers one can access directly the symbol table entry
6. To overcome the problem of problem of processing the incomplete information in one pass the backpatching technique is used
7. A ow graph is a directed graph in which the ow control information
is added to the basic blocks.
(i) The nodes to the ow graph are represented by basic blocks
(ii) The block whose leader is the rst statement is called initial block.
(iii) There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the given sequence. We can say that B1 is a
predecessor of B2.
8. Consider that there are two loops L1 is outer and L2 is an inner loop. And
allocation of variable a is to be done to some register . the approximate
scenario is as given below:
Loop L1
...
Loop L2
} L1L2
} L1L2
12/13/2012 5:14:18 PM
2.8
PART B
11. (a) (i) Phases of Compiler
A Compiler operates in phases, each of which transforms the
source program from one representation into another. The
following are the phases of the compiler:
Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
Sub-Phases:
1) Symbol table management
2) Error handling
Lexical Analysis:
It is the rst phase of the compiler. The lexical analysis is called
scanning. It is the phase of compilation in which the complete
source code is scanned and broken up into group of stings called
tokens.
It reads the characters one by one, starting from left to right and
forms the tokens. Token represents a logically cohesive sequence
of characters such as keywords, operators, identiers, special
symbols etc.
Example: position: =initial + rate*60
1.The identier position
2. The assignment symbol =
3. The identier initial
4. The plus sign
5. he identier rate
6.The multiplication sign
7. The constant number 60
Syntax Analysis:
Syntax analysis is the second phase of the compiler. It is also
known as parser. It gets the token stream as input from the lexical analyzer of the compiler and generates syntax tree as the
output.
12/13/2012 5:14:18 PM
2.9
Syntax tree:
It is a tree in which interior nodes are operators and exterior
nodes are operands.
Example: For position: =initial + rate*60, syntax tree is
=
Position +
initial
rate
60
Semantic Analysis:
Semantic Analysis is the third phase of the compiler. It gets input
from the syntax analysis as parse tree and checks whether the
given syntax is correct or not.
It performs type conversion of all the data types into real data
types.
=
Position +
initial
rate
(int to float)
60
12/13/2012 5:14:18 PM
2.10
12/13/2012 5:14:19 PM
2.11
Error Handling:
Each phase can encounter errors. After detecting an error, a phase
must handle the error so that compilation can proceed.
In lexical analysis, errors occur in separation of tokens.
In syntax analysis, errors occur during construction of syntax
tree.
In semantic analysis, errors occur when the compiler detects
constructs with right syntactic structure but no meaning and during type conversion.
In code optimization, errors occur when the result is affected
by the optimization.
In code generation, it shows error when code is missing etc.
m
Lexical Analyzer
Syntax Analyzer
Symbol Table
Management
Semantic Analyzer
Error Detection
and Handling
Code Optimization
Code Generation
Object Program
12/13/2012 5:14:19 PM
2.12
1. Scanner Generator:
These generate lexical analyzers, normally from a specication based on regular expressions.
The basic organization of lexical analyzers is based on nite
automation.
2. Parser Generators:
These produce syntax analyzers, normally from input that is
based on a context-free grammar.
It consumes a large fraction of the running time of a compiler.
Example-YACC (Yet Another Compiler-Compiler).
3. Syntax-Directed Translation:
These produce routines that walk the parse tree and as a
result generate intermediate code.
Each translation is dened in terms of translations at its
neighbor nodes in the tree.
4. Automatic Code Generators:
It takes a collection of rules to translate intermediate language into machine language. The rules must include sufcient details to handle different possible access methods
for data.
5. Data-Flow Engines:
It does code optimization using data-ow analysis, that is,
the gathering of information about
(b) (i) There are several reasons for separating the analysis phase of
compiling into lexical analysis and parsing
1. Simpler design is perhaps the most important consideration.
The separation of lexical analysis often allows us to simplify
one or the other of these phases
2. Compiler efciency improved a separate lexical analyzer allows us to construct a specialized and potentially more efcient processor for the task. A large amount of time is spent
reading the source program and partitioning it into tokens
specialized buffering techniques for reading input characters
and processing tokens can signicantly speed up the performance of a compiler.
3. Compiler portability is enhanced. Input alphabet peculiarities
and other devicespecic anomalies can be restricted to the
12/13/2012 5:14:19 PM
2.13
lexical analyzer. The representation of special or non-standard symbols, such as in Pascal, can be isolated in the lexical analyzer.
(ii) Consider the following grammar fragment:
stmt if expr then stmt
| if expr then stmt else stmt
|
expr term relop term
| term
term id
| num
Where the terminals if , then, else, relop, id and num generate
sets of strings given by the following regular denitions:
if
if
then then
else
else
relop <|<=|=|<>|>|>=
id
letter(letter|digit)*
num digit+ (.digit+)?(E(+|-)?digit+)?
For this language fragment the lexical analyzer will recognize
the keywords if, then, else, as well as the lexemes denoted by
relop, id, and num. To simplify matters, we assume keywords
are reserved; that is, they cannot be used as identiers.
Transition diagrams
It is a diagrammatic representation to depict the action that will
take place when a lexical analyzer is called by the parser to get
the next token. It is used to keep track of information about the
characters that are seen as the forward pointer scans the input.
Transition diagram for identiers and keywords
letter or digit
Start
letter
10
other
11 return(gettoken( ),
install_id( ))
12/13/2012 5:14:19 PM
2.14
Sa
S(L)
LSL
LSL
Input
(a,a)$
(a,a)$
a,a)$
a,a)$
a,a)$
,a)$
,a)$
a)$
a)$
)$
)$
$
Action
S(L)
LSL
Sa
L,SL
Sa
L
Accept
12/13/2012 5:14:20 PM
2.15
Input
Action
Stack
Input
Action
$ E+E
*id $
Reduce by
EE+E
$ E+E
*id $
Shift
$E
*id $
Shift
$ E+E*
id $
Shift
$ E*
id $
Shift
$ E+E*id
$ E*id
$ E+E*E
$ E*E
*$
$ E+E
Reduce by
Eid
Reduce by
EE*E
$E
Reduce by
Eid
Reduce by
EE*E
Reduce by
EE*E
$E
2. Reduce-reduce conict:
Consider the grammar:
M R+R | R+c | R
Rc
and input c+c
Stack
Input
Action
Stack
Input
Action
c+c $
Shift
c+c $
Shift
$c
+c $
Reduce by
Rc
$c
+c $
Reduce by
Rc
$R
+c $
Shift
$R
+c $
Shift
$ R+
c$
Shift
$R+
c$
Shift
$ R+c
Reduce by
Rc
$R+c
Reduce by
MR+c
(Continued)
12/13/2012 5:14:20 PM
2.16
$ R+R
$M
Continued
i d
Reduce by
$M
MR+R
12. (b) (i) The type checker is a translation scheme that synthesizes the
type of each expression from the types of its sub expressions.
Identier must be declared before the identier is used. The type
checker can handle arrays, pointers, statements and functions.
A Simple Language
Consider the following grammar:
PD;E
D D ; D | id : T
T char | integer | array [ num ] of T | T
E literal | num | id | E mod E | E [ E ] | E
Translation scheme:
PD;E
DD;D
D id : T
{ addtype (id.entry , T.type) }
T char
{ T.type : = char }
T integer
{ T.type : = integer }
T T1
{ T.type : = pointer(T1.type) }
T array [ num ] of T1 { T.type : = array ( 1num.val , T1.type) }
In the above language,
There are two basic types: char and integer ;
type_error is used to signal errors;
the prex operator builds a pointer type. Example, integer leads to the type expression pointer ( integer ).
Type checking of expressions
In the following rules, the attribute type for E gives the type expression assigned to the expression generated by E.
1. E literal
{ E.type : = char }
E num
{ E.type : = integer }
Here, constants represented by the tokens literal and num have
type char and integer.
2. E id
{E.type : = lookup ( id.entry ) }
lookup ( e ) is used to fetch the type saved in the symbol table
entry pointed to by e.
12/13/2012 5:14:20 PM
2.17
12/13/2012 5:14:20 PM
2.18
12/13/2012 5:14:20 PM
2.19
4. The unconditional jump goto L. The three-address statement with label L is the next to be executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (<, =, >=, etc. ) to x and y,
and executes the statement with label L next if x stands in
relation relop to y. If not, the three-address statement following if x relop y goto L is executed next, as in the usual
sequence.
6. param x and call p, n for procedure calls and return y, where
y representing a returned value is optional. For example,
param x1
param x2
param xn
call p,n
generated as part of a call of the procedure p(x1,x2,xn ).
7. Indexed assignments of the form x : = y[i] and x[i] : = y.
8. Address and pointer assignments of the form x : = &y , x :
= *y, and *x : = y.
(a) (ii) There are two types of name representation
1. Fixed-length name
2. Variable length name
1. Fixed-length name
A xed space for each name is allocated in symbol table .in this
type of storage if name is too small then there is wastage of space
For Example :
Name
Starting index
Length
0
10
10
4
14
2
16
2
Attribute
12/13/2012 5:14:20 PM
2.20
For example:
Name
c a l
s u m
a
b
Attribute
c
0 1 2 3 4 5 6 7 8 9
c a l c u l a t e $
10 11 12 13 14 15 16 17
s u m $ a $ b $
13. (b) (i) Boolean expressions have two primary purposes. They are used
to compute logical values, but more often they are used as conditional expressions in statements that alter the ow of control,
such as if-then-else, or while-do statements.
Boolean expressions are composed of the boolean operators (
and, or, and not ) applied to elements that are boolean variables
or relational expressions. Relational expressions are of the form
E1 relop E2, where E1 and E2 are arithmetic expressions.
Here we consider boolean expressions generated by the following grammar :
E E or E | E and E | not E | ( E ) | id relop id | true | false
Methods of Translating Boolean Expressions:
There are two principal methods of representing the value of a
boolean expression. They are :
i. To encode true and false numerically and to evaluate a boolean expression analogously to an arithmetic expression. Often, 1 is used to denote true and 0 to denote false.
ii. To implement boolean expressions by ow of control, that is,
representing the value of a boolean expression by a position
reached in a program. This method is particularly convenient
in implementing the boolean expressions in ow-of-control
statements, such as the if-then and while-do statements.
Numerical Representation
Here, 1 denotes true and 0 denotes false. Expressions will be
evaluated completely from left to right, in a manner similar to
arithmetic expressions.
For example :
The translation for
a or b and not c
12/13/2012 5:14:20 PM
2.21
12/13/2012 5:14:20 PM
2.22
E.true
r :
E.code
E.true
r :
to E.true
r
S1.code
to E.false
f
to E.fals
f e
S1.code
goto S.next
e
E.fals
f e:
E.fals
f e:
S2.code
S.next
e t:
(a) if-then
(b) if-then-else
12/13/2012 5:14:20 PM
2.23
to E.true
r
E.code
S.begin
i :
E.true
r :
to E.fals
f e
S1.code
goto S.next
e
E.false
f
:
(c) while-do
SEMANTIC RULES
S if E then S1
E.true : = newlabel;
E.false : = S.next;
S1.next : = S.next;
S.code : = E.code || gen(E.true :) || S1.code
S if E then S1 else S2
E.true : = newlabel;
E.false : = newlabel;
S1.next : = S.next;
S2.next : = S.next;
S.code : = E.code || gen(E.true :) || S1.code ||
gen(goto S.next) ||
gen( E.false :) || S2.code
S while E do S1
S.begin : = newlabel;
E-True: = newlabel
E.false : = S.next;
S1.next : = S.begin;
S.code : = gen(S.begin :)|| E.code ||
gen(E.true :) || S1.code ||
gen(goto S.begin)
SEMANTIC RULES
E E1 or E2
E1.true : = E.true;
E1.false : = newlabel;
E2.true : = E.true;
(Continued)
12/13/2012 5:14:21 PM
2.24
Continued
E2.false : = E.false;
E.code : = E1.code || gen(E1.false :) || E2.code
E E1 and E2
E.true := newlabel;
E1.false : = E.false;
E2.true : = E.true;
E2.false : = E.false;
E.code : = E1.code || gen(E1.true :) || E2.code
E not E1
E1.true : = E.false;
E1.false : = E.true;
E.code : = E1.code
E ( E1 )
E1.true : = E.true;
E1.false : = E.false;
E.code : = E1.code
E true
E false
(ii) The procedure is such an important and frequently used programming construct that it is imperative for a compiler to generate good code for procedure calls and returns. The run-time routines that handle procedure argument passing, calls and returns
are part of the run-time support package.
Let us consider a grammar for a simple procedure call statement
1. S call id ( Elist )
2. Elist Elist , E
3. Elist E
Calling Sequences:
The translation for a call includes a calling sequence, a sequence
of actions taken on entry to and exit from each procedure. The
falling are the actions that take place in a calling sequence :
1. When a procedure call occurs, space must be allocated for the
activation record of the called procedure.
12/13/2012 5:14:21 PM
2.25
12/13/2012 5:14:21 PM
2.26
12/13/2012 5:14:21 PM
2.27
12/13/2012 5:14:21 PM
2.28
12/13/2012 5:14:21 PM
2.29
12/13/2012 5:14:21 PM
2.30
3. Generate the instruction OP z , L where z is a current location of z. Prefer a register to a memory location if z is in
both. Update the address descriptor of x to indicate that x is
in location L. If x is in L, update its descriptor and remove x
from all other descriptors.
4. If the current values of y or z have no next uses, are not live
on exit from the block, and are in registers, alter the register
descriptor to indicate that, after execution of x : = y op z ,
those registers will no longer contain y or z.
(ii) The DAG can be constructed in following steps:
Step 1:
*d
Step 2:
+e
Step 3:
+e
* d,b
Step 4:
a
+e
*d,
15. (a) (i) A transformation of a program is called local if it can be performed by looking only at the statements in a basic block; otherwise, it is called global.
Many transformations can be performed at both the local and
global levels. Local transformations are usually performed rst.
12/13/2012 5:14:21 PM
2.31
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a
program without changing the function it computes.
The transformations:
1. Common sub expression elimination,
2. Copy propagation,
3. Dead-code elimination, and
4. Constant folding
are common examples of such function-preserving transformations. The other transformations come up primarily when global
optimizations are performed.
Common Sub expressions elimination:
An occurrence of an expression E is called a common sub-expression if E was previously computed, and the values of variables in E have not changed since the previous computation. We
can avoid recomputing the expression if we can use the previously computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: = 4*i is eliminated as its computation is already in t1. And value of i is not been changed from
denition to use.
Copy Propagation:
Assignments of the form f : = g called copy statements, or copies for short. The idea behind the copy-propagation transformation is to use g for f, whenever possible after the copy statement
f: = g. Copy propagation means use of one variable instead of
another. This may not appear to be an improvement, but as we
shall see it gives us an opportunity to eliminate x.
12/13/2012 5:14:21 PM
2.32
For example:
x = Pi;
A = x*r*r;
The optimization using copy propagation can be done as follows:
A = Pi*r*r;
Here the variable x is eliminated
Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is
dead or useless code, statements that compute values that never
get used. While the programmer is unlikely to introduce any dead
code intentionally, it may appear as the result of previous transformations. An optimization can be done by eliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, if statement is dead code because this condition will
never get satised.
Constant folding:
We can eliminate both the test and printing from the object code.
More generally, deducing at compile time that the value of an
expression is a constant and using the constant instead is known
as constant folding.
One advantage of copy propagation is that it often turns the copy
statement into dead code.
For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.
Loop Optimizations:
We now give a brief introduction to a very important place for
optimizations, namely loops, especially the inner loops where
programs tend to spend the bulk of their time. The running time
of a program may be improved if we decrease the number of
instructions in an inner loop, even if we increase the amount of
code outside that loop.
12/13/2012 5:14:21 PM
2.33
12/13/2012 5:14:22 PM
2.34
12/13/2012 5:14:22 PM
2.35
goto L2
.
L1: goto L2
Algebraic Simplication:
There is no end to the amount of algebraic simplication that
can be attempted through peephole optimization. Only a few
algebraic identities occur frequently enough that it is worth considering implementing them. For example, statements such as
x : = x+0
Or
x:=x*1
Reduction in Strength:
Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper than others and can often be
used as special cases of more expensive operators.
For example, x is invariably cheaper to implement as x*x
than as a call to an exponentiation
routine.X2 X*X
Use of Machine Idioms:
The target machine may have hardware instructions to implement certain specic operations efciently. For example, some
machines have auto-increment and auto-decrement addressing
modes. These add or subtract one from an operand before or
after using its value.
The use of these modes greatly improves the quality of code
when pushing or popping a stack, as in parameter passing. These
modes can also be used in code for statements like
i : = i+1.
i: = i+1 i++
i: = i-1n i- (b) (i) Flow graphs for control ow constructs such as do-while statements have a useful property: there is a single beginning point
at which control enters and a single end point that control leaves
from when execution of the statement is over. We exploit this
property when we talk of the denitions reaching the beginning
and the end of statements with the following syntax.
S id: = E| S; S | if E then S else S | do S while E
E id + id| id
12/13/2012 5:14:22 PM
2.36
Expressions in this language are similar to those in the intermediate code, but the ow graphs for statements have restricted
forms.
S1
S1
S2
If E goto S1
If E goto S1
S1
S1;S2
S2
If E then S1 else S2
do S1 while E
12/13/2012 5:14:22 PM
2.37
Step
S
2 and 3
m:= 4*k
t1:= m
t2:= a[t1]
t5:= 4*k
t6:= a[t1]
t5:= m
t6:= a[t5]
(12)
(15)
Step 4
now if we assign value to
common subexpression then,
(12):= 4*k
(15):= a[(12)]
t5:= (12)
t6:= (15)
Copy propagation:
The assignment in the form a=b is called copy statement. The
idea behind the copy propagation transformation is to use b for
a whenever possible after copy statement a:=b .
Algorithm: Copy propagation.
Input: a ow graph G, with ud-chains giving the denitions
reaching block B
Output: A graph after Appling copy propagation transformation.
Method: For each copy s: x: =y do the following:
1. Determine those uses of x that are reached by this denition
of namely, s: x: = y.
2. Determine whether for every use of x found in (1) , s is
in c_in[B], where B is the block of this particular use, and
moreover, no denitions of x or y occur prior to this use of x
within B. Recall that if s is in c_in[B]then s is the only denition of x that reaches B.
3. If s meets the conditions of (2), then remove s and replace all
uses of x found in (1) by y.
12/13/2012 5:14:22 PM
2.38
Step 1 and 2
x:= t3
this is a copy
y statement
a[t1]:= t2
a[t4]:= x
use
y: = x + 3 use
a[t5]: = y
since value of t3 and x is not altered along the path from is denition we will replace x by t3 and then eliminate the copy statement.
x:= t3
a[t1]:
[ = t2
a[t4]:= t3 Eliminating
y:= t3 + 3 copy statement
a[t5]: = y
a[t1]:= t2
a[t4]:= t3
y:= t3 + 3
a[t5]: = y
12/13/2012 5:14:22 PM
2.39
B1
i=i+1
t2 = 4*i
t3 = a[t2]
if t3<v goto B2
B2
i=j1
t4 = t4 4
t5 = a[t4]
if t5>v goto B3
B3
if i> = j goto B6
x = t3
a [t2] = t5
a [t4] = x
goto B2
B5
B4
x = t3
t14 = a[t1]
a [t2] = t14
a[t1] = x
B6
12/13/2012 5:14:23 PM
2.40
i=m1
j=n
t1 = 4*n
v = a[t1]
t2 = 4*i
t4 = 4*j
a [t7] = t5
a [t10] = t3
goto B2
B1
t2 = t2 + 4
t3 = a[t2]
if t3<v goto B2
B2
t4 = t4 4
t5 = a[t4]
if t5>v goto B3
B3
if t2>t4 goto B6
B4
B5
t14 = a[t1]
a [t2] = t14
a[t1] = t3
B6
12/13/2012 5:14:23 PM
12/13/2012 5:14:23 PM
2.42
PART B (5 16 = 80 marks)
11. (a) (i) Describe the various phases of compiler and trace it with the
(10)
program segment (position: = initial + rate * 60).
(ii) State the complier construction tools. Explain them.
(6)
Or
(b) (i) Explain briey about input buffering in reading the source program for nding the tokens.
(8)
(ii) Construct the minimized DFA for the regular expression
(0+1)*(0+1) 10.
(8)
12. (a) Construct a canonical parsing table for the grammar given below.
Also explain the algorithm used.
(16)
EE+T
F (E )
ET
F id .
TT*F
TF
Or
(b) What are the different storage allocation strategies? Explain.
(16)
13. (a) (i) Write down the translation scheme to generate code for assignment statement. Use the scheme for generating three address
code for the assignment statement g: = a+b- c*d.
(8)
(ii) Describe the various methods of implementing three-address
statements.
(8)
Or
(b) (i) How can Back patching be used to generate code for Boolean
(10)
expressions and ow of control statements?
(ii) Write a short note on procedures calls.
14. (a) (i) Discuss the issues in the design of code generator.
(6)
(10)
12/13/2012 5:14:23 PM
2.43
(8)
(8)
(16)
Or
(b) (i) Explain in detail optimization of basic blocks with example. (8)
(ii) Write about Data ow analysis of structural programs.
(8)
12/13/2012 5:14:23 PM
Solutions
PART A
1.
tokens
parser
+
semantic analyzer
lexical
analyzer
source
program
syntax
tree
symbol
table
manager
INSERT FIGURE
Main Task: Take a token sequence from the scanner and verify that it is
a syntactically correct program.
Secondary Tasks:
Process declarations and set up symbol table information accordingly,
in preparation for semantic analysis.
Construct a syntax tree in preparation for intermediate code generation.
2.
Letter or digit
Start
1
Letter
other
12/13/2012 5:14:23 PM
2.45
b
c
12/13/2012 5:14:24 PM
2.46
Flow Graphs
Flow graph is a directed graph containing the ow-of-control information for the set of basic blocks making up a program.
The nodes of the ow graph are basic blocks. It has a distinguished
initial node.
8. A DAG for a basic block is a directed acyclic graph with the following
labels on nodes:
1. Leaves are labeled by unique identiers, either variable names or
constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identiers for labels to
store the computed values.
DAGs are useful data structures for implementing transformations on
basic blocks.
9. Algorithms for performing the code improving transformations rely on
data-ow information. Here we consider common sub-expression elimination, copy propagation and transformations for moving loop invariant
computations out of loops and for eliminating induction variables.
10. Whenever storage can be deallocated, the problem of dangling references arises. A dangling reference occurs when there is a reference to
storage that has been deallocated. It is a logical error to use dangling
references, since the value of deallocated storage is undened according
to the semantics of most languages. Worse, since that storage may later
be allocated to another datum, mysterious bugs can appear in programs
with dangling references.
PART B
11. (a) (i) Phases of Compiler
A Compiler operates in phases, each of which transforms the
source program from one representation into another. The following are the phases of the compiler:
Main phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Code optimization
6. Code generation
12/13/2012 5:14:24 PM
2.47
Sub-Phases:
1. Symbol table management
2. Error handling
Lexical Analysis:
It is the rst phase of the compiler. The lexical analysis is called
scanning. It is the phase of compilation in which the complete
source code is scanned and broken up into group of stings called
tokens.
It reads the characters one by one, starting from left to right
and forms the tokens. Token represents a logically cohesive
sequence of characters such as keywords, operators, identiers,
special symbols etc.
Example: position: =initial + rate*60
1. The identier position
2. The assignment symbol =
3. The identier initial
4. The plus sign
5. The identier rate
6. The multiplication sign
7. The constant number 60
Syntax Analysis:
Syntax analysis is the second phase of the compiler. It is also
known as parser. It gets the token stream as input from the lexical analyzer of the compiler and generates syntax tree as the
output.
Syntax tree:
It is a tree in which interior nodes are operators and exterior
nodes are operands.
Example: For position: =initial + rate*60, syntax tree is
=
Position +
initial
rate
60
Semantic Analysis:
Semantic Analysis is the third phase of the compiler. It gets
input from the syntax analysis as parse tree and checks whether
the given syntax is correct or not.
12/13/2012 5:14:24 PM
2.48
It performs type conversion of all the data types into real data
types.
=
Position +
initial
rate
(int to float)
60
12/13/2012 5:14:25 PM
2.49
Code Generaton:
Code Generation gets input from code optimization phase and
produces the target code or object code as result.
Intermediate instructions are translated into a sequence of
machine instructions that perform the same task.
The code generation involves
allocation of register and memory
generation of correct references
generation of correct data types
generation of missing code
Machine instructions:
MOV rate, R1
MUL #60, R1
MOV initial, R2
ADD R2, R1
MOV R1, position
Symbol Table Management:
Symbol table is used to store all the information about identiers
used in the program.
It is a data structure containing a record for each identier,
with elds for the attributes of the identier.
It allows to nd the record for each identier quickly and to
store or retrieve data from that record.
Whenever an identier is detected in any of the phases, it is
stored in the symbol table.
Error Handling:
Each phase can encounter errors. After detecting an error, a
phase must handle the error so that compilation can proceed.
In lexical analysis, errors occur in separation of tokens.
In syntax analysis, errors occur during construction of syntax
tree.
In semantic analysis, errors occur when the compiler detects
constructs with right syntactic structure but no meaning and
during type conversion.
In code optimization, errors occur when the result is affected
by the optimization.
In code generation, it shows error when code is missing etc.
12/13/2012 5:14:25 PM
2.50
Lexical Analyzer
Syntax Analyzer
Symbol Table
Management
Semantic Analyzer
Error Detection
and Handling
Code Optimization
Code Generation
Object Program
12/13/2012 5:14:25 PM
2.51
3. Syntax-Directed Translation:
These produce routines that walk the parse tree and as a
result generate intermediate code.
Each translation is dened in terms of translations at its
neighbor nodes in the tree.
4. Automatic Code Generators:
It takes a collection of rules to translate intermediate language into machine language. The rules must include sufcient details to handle different possible access methods
for data.
5. Data-Flow Engines:
It does code optimization using data-ow analysis, that is,
the gathering of information about
(b) (i) As characters are read from left to right, each character is stored
in the buffer to form a meaningful token as shown below:
Forward pointer
A
C : * : : * : 2 : eof
Forward
Lexeme beginning
12/13/2012 5:14:25 PM
2.52
C : * : : * : 2 : eof : : : eof
Forward
Lexeme beginning
12/13/2012 5:14:26 PM
2.53
(0+1)*(0+1)
10
(0+1)*
(0+1)
10
(0+1)
0,1
0,1
0
q0
0
1
q1 1
q2
q3
q0
{q0,q1}
{q0,q1}
q1
(q2)
q2
{q3}
*q3
12/13/2012 5:14:26 PM
2.54
States
[q0]
[q0,q1]
[q0,q1]
[q1]
[q2]
[q2]
[q3]
*[q3]
[q0,q1]
[q0,q1]
[q0,q1,q2]
[q0,q1,q2]
[q0,q1,q2]
[q0,q1,q2]
*[q0,q1,q3]
[q0,q1]
[q0,q1,q2]
0,1
E
1
1
G
12/13/2012 5:14:26 PM
2.55
12/13/2012 5:14:27 PM
2.56
+
id (
*
S4 S2
S6
r3 r3 r3
r4 r4 r4
r5 r5 r5
r1
r1
S6
S10
r3
r4
r5
r1
E
1
T
3
F
5
5
11
S8
r2 r2 r2
r5 r5 r5
r2
r5
12/13/2012 5:14:27 PM
2.57
12/13/2012 5:14:27 PM
2.58
Items whose size may not be known early enough are placed at the
end of the activation record. The most common example is dynamically sized array, where the value of one of the callees parameters
determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common
approach is to have it point to the end of xed-length elds in the
activation record. Fixed-length data can then be accessed by xed
offsets, known to the intermediate-code generator, relative to the
top-of-stack pointer.
Parameters and returned values
callers
activation
record
callers
resp
ponsibility
callees
activation
record
callers
responsibility
control link
links and saved status
temporaries and local data
Parameters and returned values
control link
links and saved status
top_
p sp
s
temporaries and local data
12/13/2012 5:14:27 PM
2.59
control link
activation
n
re
ecord for p
pointer to A
pointer to B
pointer to C
array A
array B
arrays of p
array C
activation
ation reco
record for
procedure
ure q call
called by p
control link
arrays of q
top_sp
top
12/13/2012 5:14:28 PM
2.60
Heap Allocation:
Stack allocation strategy cannot be used if either of the following is
possible :
1. The values of local names must be retained when an activation
ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as needed for activation records or other objects.
Pieces may be deallocated in any order, so over the time the heap
will consist of alternate areas that are free and in use.
The record for an activation of procedure r is retained when the
activation ends.
Therefore, the record for the new activation q(1,9)cannot follow
that for s physically.
If the retained activation record for r is deallocated, there will be
free space in the heap between the activation records for s and q.
Position in the
activation tree
s
s
r
q (1,9)
Remarks
Retained activation
record for r
control link
r
control link
q(1,9)
control link
13. (a) (i) Suppose that the context in which an assignment appears is given by the following grammar.
PM D
M
DD ; D | id : T | proc id ; N D ; S
N
Non terminal P becomes the new start symbol when these productions are added to those in the translation scheme shown below.
Translation scheme to produce three-address code for assignments
12/13/2012 5:14:28 PM
2.61
Sid : = E
{ p : = lookup ( id.name);
iff p nil then
emit( p : = E.place)
else error }
EE1 + E2
{ E.place : = newtemp;
emit( E.place : = E1.place + E2.place ) }
EE1
E2
{ E.place : = newtemp;
emit( E.place : = E1.place * E2.place ) }
EE1
{ E.place : = newtemp;
emit ( E.place : = uminus E1.place )}
E( E1 )
{ E.place : = E1.place }
Eid
{id_entry:=lookup(id.name);
if id_entry!= nil then
append (id_entry :=E. place)
else error}
Production Rule
Semantic action
Eid
E.place:=a
Eid
E.place:=b
EE1+E2
E.place:=t1
Eid
E.place:=c
Eid
E.place:=d
EE1*E2
E.place:=t1
t2:=c*d
EE1- E2
E.place:=t3
t 3:=(a+b)- (c*d)
Sid:=E
Output
t1:=a+b
g:=t3
12/13/2012 5:14:28 PM
2.62
Arg1
c
b
c
b
t2
t3
Agr2
t1
t3
t4
Result
t1
t2
t3
t4
t5
a
12/13/2012 5:14:28 PM
2.63
Triples
op
Arg 1
Uminus
*
Uminus
*
+
assign
b
c
b
(1)
a
Arg2
(0)
(2)
(3)
(4)
Indirect triple:
Op
Arg 1
(14)
Uminus
(15)
(16)
Uminus
(17)
(16)
(18)
(15)
(17)
(19)
assign
(18)
statement
(0)
(14)
(1)
(15)
(2)
(16)
(3)
(17)
(4)
(18)
(5)
(19)
Arg2
(14)
12/13/2012 5:14:29 PM
2.64
1. EE1 or M E2
2.
| E1 and M E2
3.
| not E1
4.
| ( E1)
5.
| id1 relop id2
6.
| true
7.
| false
(8) M *
Synthesized attributes truelist and falselist of nonterminal E are
used to generate jumping code for boolean expressions. Incomplete jumps with unlled labels are placed on lists pointed to by
E.truelist and E.falselist.
Consider production E E1 and M E2.
If E1 is false, then E is also false, so the statements on
E1.falselist become part of E.falselist.
If E1 is true, then we must next test E2, so the target for the
statements E1.truelist must be the beginning of the code generated for E2. This target is obtained using marker nonterminal M.
Attribute M.quad records the number of the rst statement of
E2.code. With the production M * we associate the semantic
action
{ M.quad : = nextquad }
The variable nextquad holds the index of the next quadruple to
follow. This value will be backpatched onto the E1.truelist when
we have seen the remainder of the production E E1 and M E2.
The translation scheme is as follows:
1. EE1 or M E2 {backpatch ( E1.falselist,M.quad);
E.truelist : = merge(E1.truelist,
E2.truelist);
E.falselist : = E2.falselist }
2. EE1 and M E2 {backpatch ( E1.truelist, M.quad);
E.truelist : = E2.truelist;
E.falselist : = merge(E1.falselist,
E2.falselist) }
3. E not E1
{ E.truelist : = E1.falselist;
E.falselist : = E1.truelist; }
4. E( E1 )
{E.truelist : = E1.truelist;
E.falselist : = E1.falselist; }
5. Eid1 relop id2 { E.truelist : = makelist (nextquad);
E.falselist : = makelist(nextquad + 1);
emit(if id1.place relop.op id2.place goto_)
emit(goto_) }
12/13/2012 5:14:29 PM
6. E true
7. E false
8. M *
2.65
{ E.truelist : = makelist(nextquad);
emit(goto_)}
{ E.falselist : = makelist(nextquad);
emit(goto_)}
{ M.quad : = nextquad }
Flow-of-Control Statements:
A translation scheme is developed for statements generated by
the following grammar :
1. S iff E then S
2.
| iff E then S else S
3.
| while E do S
4.
| begin L end
5.
|A
6. L L ; S
7.
|S
Here S denotes a statement, L a statement list, A an assignment statement, and E a Boolean expression. We make the tacit
assumption that the code that follows a given statement in execution also follows it physically in the quadruple array. Else, an
explicit jump must be provided.
Scheme to implement the Translation:
The nonterminal E has two attributes E.truelist and E.falselist. L
and S also need a list of unlled quadruples that must eventually
be completed by backpatching. These lists are pointed to by the
attributes L..nextlist and S.nextlist. S.nextlist is a pointer to a list
of all conditional and unconditional jumps to the quadruple following the statement S in execution order, and L.nextlist
is dened similarly.
The semantic rules for the revised grammar are as follows:
1 Sif E then M1 S1 N else M2 S2
{ backpatch (E.truelist, M1.quad);
backpatch (E.falselist, M2.quad);
S.nextlist : = merge (S1.nextlist, merge (N.nextlist,
S2.nextlist))}
We backpatch the jumps when E is true to the quadruple
M1.quad, which is the beginning of the code for S1. Similarly,
we backpatch jumps when E is false to go to the beginning of the
code for S2. The list S.nextlist includes all jumps out of S1 and
S2, as well as the jump generated by N.
2. N
{N.nextlist : = makelist( nextquad );
emit(goto _)}
12/13/2012 5:14:29 PM
2.66
3. M
{M.quad : = nextquad }
4. Siff E then M S1 {backpatch( E.truelist, M.quad);
S.nextlist : = merge( E.falselist,
S1.nextlist)}
5. Swhile M1 E do M2 S1 { backpatch( S1.nextlist, M1.
quad);
backpatch( E.truelist,
M2.quad);
S.nextlist : = E.falselist
emit( goto M1.quad)}
6. Sbegin L end
{S.nextlist : = L.nextlist }
7. SA
{ S.nextlist : = nil }
The assignment S.nextlist : = nil initializes S.nextlist to an
empty list.
8. LL1 ; M S
{ backpatch( L1.nextlist, M.quad);
L.nextlist : = S.nextlist }
The statement following L1 in order of execution is the beginning of S. Thus the L1.nextlist list is backpatched to the beginning of the code for S, which is given by M.quad.
9. L S
{L.nextlist : = S.nextlist }
(ii) The procedure is such an important and frequently used programming construct that it is imperative for a compiler to generate good code for procedure calls and returns. The run-time
routines that handle procedure argument passing, calls and
returns are part of the run-time support package.
Let us consider a grammar for a simple procedure call statement
1. Scall id (Elist
(
)
2. Elis Elistt , E
3. ElistE
Calling Sequences:
The translation for a call includes a calling sequence, a sequence
of actions taken on entry to and exit from each procedure. The
falling are the actions that take place in a calling sequence :
1. When a procedure call occurs, space must be allocated for the
activation record of the called procedure.
2. The arguments of the called procedure must be evaluated and
made available to the called procedure in a known place.
3. Environment pointers must be established to enable the called
procedure to access data in enclosing blocks.
4. The state of the calling procedure must be saved so it can
resume execution after the call. Also saved in a known place
12/13/2012 5:14:29 PM
2.67
12/13/2012 5:14:29 PM
2.68
2. Target program:
The output of the code generator is the target program. The output may be:
a. Absolute machine language
It can be placed in a xed memory location and can be
executed immediately.
b. Relocatable machine language
It allows subprograms to be compiled separately.
c. Assembly language
Code generation is made easier.
3. Memory management:
Names in the source program are mapped to addresses of data objects in run-time memory by the front end and code generator.
It makes use of symbol table, that is, a name in a three-address
statement refers to a symbol-table entry for the name.
Labels in three-address statements have to be converted to addresses of instructions For example,
j : goto i generates jump instruction as follows :
if i < j, a backward jump instruction with target address equal
to location of code for quadruple i is generated.
if i > j, the jump is forward. We must store on a list for quadruple i the location of the rst machine instruction generated
for quadruple j. When i is processed, the machine locations for
all instructions that forward jumps to I are lled.
4. Instruction selection:
The instructions of target machine should be complete and uniform. Instruction speeds and machine idioms are important factors when efciency of target program is considered. The quality of the generated code is determined by its speed and size.
For example
X:=y+z
A:=x+t
The code for the above statements can be generated as follows:
MOV y,R0
ADD Z,R0
MOV R0,x
MOV x,R0
ADD t,R0
MOV R0,a
12/13/2012 5:14:29 PM
2.69
5. Register allocation
Instructions involving register operands are shorter and faster
than those involving operands in memory.
The use of registers is subdivided into two sub problems:
Register allocation the set of variables that will reside in registers at a point in the program is selected.
Register assignment the specic register that a variable will
reside in is picked.
For example , consider the division instruction of the form :
D x, y
where, x dividend even register in even/odd register pair y
divisor
even register holds the remainder
odd register holds the quotient
6. Evaluation order
The order in which the computations are performed can affect the efficiency of the target code. Some computation orders require fewer registers to hold intermediate results than
others
(ii) a. Com.mon subexpression elimination:
a:=b+c
a:=b+c
b:=ad
b:=ad
c:=b+c
c:=b+c
d:=ad
d:=b
Since the second and fourth expressions compute the same
expression, the basic block can be transformed as above.
b. Dead-code elimination:
Suppose x is dead, that is, never subsequently used, at the
point where the statement x : = y + z appears in a basic block.
Then this statement may be safely removed without changing the value of the basic block.
c. Renaming temporary variables:
A statement t : = b + c ( t is a temporary ) can be changed
to u : = b + c (u is a new temporary) and all uses of this instance of t can be changed to u without changing the value
of the basic block.
Such a block is called a normal-form block.
12/13/2012 5:14:29 PM
2.70
d. Interchange of statements:
Suppose a block has the following two adjacent statements:
t1 : = b + c
t2 : = x + y
We can interchange the two statements without affecting the
value of the block if and only if neither x nor y is t1 and
neither b nor c is t2.
(b) (i) A code generator generates target code for a sequence of threeaddress statements and effectively uses registers to store operands of the statements.
For example: consider the three-address statement a := b+c
It can have the following sequence of codes:
ADD Rj, Ri
Cost = 1 // if Ri contains b and Rj contains c
(or)
ADD c, Ri
Cost = 2 // if c is in a memory location
(or)
MOV c, Rj
ADD Rj, Ri
12/13/2012 5:14:29 PM
2.71
{
If (operator =+)
Generate (ADD operand 1, RO);
Else if (operator =- )
Generate (SUB operand1, RO);
Else if(operator =*)
Generate( MUL operand1, RO);
Else if (operator =/ )
Generate( DIV operand1, RO);
}
Else
{
Generate (MOV operand2, RO);
If (operator =+)
Generate (ADD operand2, RO);
Else if (operator =- )
Generate (SUB operand2, RO);
Else if(operator =*)
Generate( MUL operand2, RO);
Else if (operator =/ )
Generate( DIV operand2, RO);
}
The algorithm takes as input a sequence of three-address statements constituting a basic block.
For each three-address statement of the form x : = y op z, perform the following actions:
1. Invoke a function getreg to determine the location L where
the result of the computation y opz should be stored.
2. Consult the address descriptor for y to determine y, the current location of y. Prefer the register for y if the value of y is
currently both in memory and a register. If the value of y is
not already in L, generate the instruction MOV y, L to place
a copy of y in L.
3. enerate the instruction OP z, L where z is a current location of z. Prefer a register to a memory location if z is in
both. Update the address descriptor of x to indicate that x is
in location L. If x is in L, update its descriptor and remove x
from all other descriptors.
4. If the current values of y or z have no next uses, are not live
on exit from the block, and are in registers, alter the register
descriptor to indicate that, after execution of x : = y op z ,
those registers will no longer contain y or z.
12/13/2012 5:14:29 PM
2.72
(ii) A statement-by-statement code-generations strategy often produce target code that contains redundant instructions and suboptimal constructs .The quality of such target code can be improved by applying optimizing transformations to the target
program.
A simple but effective technique for improving the target
code is peephole optimization, a method for trying to improving the performance of the target program by examining a short
sequence of target instructions (called the peephole) and replacing these instructions by a shorter or faster sequence, whenever
possible.
The following examples of program transformations those are
characteristic of peephole optimizations:
Redundant-instructions elimination
Flow-of-control optimizations
Algebraic simplications
Use of machine idioms
Unreachable Code
Redundant-instructions elimination
If we see the instructions sequence
MOV R0,a
MOV a,R0
we can eliminate the second instruction since x is already in
R0.
Unreachable Code:
We can eliminate the unreachable instructions for example
Sum=0
If(sum)
Print (%d,sum);
Now this if statement will never get executed hence we can
eliminate such a unreachable code
Flows-Of-Control Optimizations:
The unnecessary jumps on jumps can be eliminated in either
the intermediate code or the target code by the following types
of peephole optimizations. We can replace the jump sequence
goto L1
L1: gotoL2
by the sequence
goto L2
12/13/2012 5:14:29 PM
2.73
L1: goto L2
Algebraic Simplication:
There is no end to the amount of algebraic simplication that
can be attempted through peephole optimization. Only a few
algebraic identities occur frequently enough that it is worth considering implementing them .For example, statements such as
x := x+0
Or
x := x * 1
Reduction in Strength:
Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine
instructions are considerably cheaper than others and can often
be used as special cases of more expensive operators.
For example, x is invariably cheaper to implement as x*x
than as a call to an exponentiation
routine. X2X*X
Use of Machine Idioms:
The target machine may have hardware instructions to implement certain specic operations efciently. For example, some
machines have auto-increment and auto-decrement addressing
modes. These add or subtract one from an operand before or
after using its value.
The use of these modes greatly improves the quality of code
when pushing or popping a stack, as in parameter passing. These
modes can also be used in code for statements like
i :=i+1.
i:=i+1i++
i:=i- 1i- 15. (a) A transformation of a program is called local if it can be performed
by looking only at the statements in a basic block; otherwise, it is
called global. Many transformations can be performed at both the
local and global levels. Local transformations are usually performed
rst.
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without changing the function it computes.
12/13/2012 5:14:29 PM
2.74
The transformations:
1. Common sub expression elimination,
2. Copy propagation,
3. Dead-code elimination, and
4. Constant folding
are common examples of such function-preserving transformations.
The other transformations come up primarily when global optimizations are performed.
Common Sub expressions elimination:
An occurrence of an expression E is called a common sub-expression
if E was previously computed, and the values of variables in E have
not changed since the previous computation. We can avoid recomputing the expression if we can use the previously computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression
elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1. And value of i is not been changed from denition to use.
Copy Propagation:
Assignments of the form f : = g called copy statements, or copies
for short. The idea behind the copy-propagation transformation is to
use g for f, whenever possible after the copy statement f: = g. Copy
propagation means use of one variable instead of another. This may
not appear to be an improvement, but as we shall see it gives us an
opportunity to eliminate x.
For example:
x=Pi;
A=x*r*r;
12/13/2012 5:14:30 PM
2.75
12/13/2012 5:14:30 PM
2.76
Code Motion:
An important modication that decreases the amount of code in a
loop is code motion. This transformation takes an expression that
yields the same result independent of the number of times a loop is
executed ( a loop-invariant computation) and places the expression
before the loop. Note that the notion before the loop assumes the
existence of an entry for the loop. For example, evaluation of limit-2
is a loop-invariant computation in the following while-statement:
while (i <= limit-2) /* statement does not change limit*/
Code motion will result in the equivalent of
t= limit-2;
while (i<=t) /* statement does not change limit or t */
Induction Variables :
A variable x is called an induction variable of loop L if the value
of variable gets changed every time .it is either decremented or
incremented by some constant
For example:
B1
i:=i+1
t1:=4*j
t2:=a[t1]
if t2 <10 goto B1
in above code the values of i and t1 are in locked state. that is ,when
value of i gets incremented by 1 then t1 gets incremented by 4 . hence
i and t4 are induction variables when there are two or more induction
variables in loop ,it may be possible to get rid of all but one
Reduction In Strength:
Reduction in strength replaces expensive operations by equivalent
cheaper ones on the target machine. Certain machine instructions
are considerably cheaper than others and can often be used as special
cases of more expensive operators.
For example, x is invariably cheaper to implement as x*x than as
a call to an exponentiation routine.
(b) (i) There are two types of basic block optimizations. They are:
Structure-Preserving Transformations
Algebraic Transformations
Structure-Preserving Transformations:
The primary Structue-Preserving Transformation on basic
blocks are:
12/13/2012 5:14:30 PM
2.77
12/13/2012 5:14:30 PM
2.78
12/13/2012 5:14:30 PM
2.79
S1
S1
If E goto S1
If E goto S1
S2
S1
S1;S2
S2
If E then S1 else S2
do S1 while E
12/13/2012 5:14:30 PM
PART B (5 16 = 80)
11. (a) (i) Describe the various Phases of compiler and trace the program
segment 4: * + = cba for all phases.
(10)
(ii) Explain in detail about compiler construction tools.
(6)
12/13/2012 5:14:30 PM
2.81
Or
(b) (i) Discuss the role of lexical analyzer in detail.
(8)
(ii) Draw the transition diagram for relational operators and unsigned
numbers in Pascal.
(8)
12. (a) (i) Explain the error recovery strategies in syntax analysis.
(6)
12/13/2012 5:14:30 PM
2.82
(b) (i) How to generate a code for a basic block from its dag representation? Explain.
(6)
(ii) Briey explain about simple code generator.
(10)
(8)
(8)
Or
(b) (i) Write an algorithm to construct the natural loop of a back edge.
(6)
(ii) Explain in detail about code-improving transformations.
(10)
12/13/2012 5:14:30 PM
Solutions
PART A
1. An Interpreter is a translator which produces the result directly when the
source language and data is given to it as input. It does not produce the
object code. The source program gets interpreted every time the source
program is analyzed.
Data
Source program
Result
INTERPRETER
(Direct Execution)
Token
Int
Keyword
Identier
12/13/2012 5:14:30 PM
2.84
Production
id+id+id
id
E->id
E+id+id
id
E->id
E+E+id
Id
E->id
E+E+E
E+E
E->E+E
E+E
E+E
E->E+E
E
4. The static allocation can be done only if the size of data object is known
at compile time.
The data structures cannot be created dynamically. In the sense that, the
static allocation cannot manage the allocation of memory at run time.
Recursive procedures are not supported by this type of allocation.
5. A compiler for different machines can be created by attaching different
backend to the existing front ends of each machine.
A compiler for different source languages can be created by proving
different front ends for corresponding source languages to existing
backend.
A machine independent code optimizer can be applied to intermediate
code in order to optimize the code generation.
6. The natural hierarchical structure is represented by syntax trees.
::=
+
*
b
12/13/2012 5:14:31 PM
2.85
Consider a statement,
x:=i
j:= x op y
that means the statements j uses value of x.
9. A variable is live at a point in a program if its value can be used subsequently; otherwise it is dead at that point. A related idea is dead or
useless code, statements that compute values that never get used. While
the programmer is unlikely to introduce any dead code intentionally, it
may appear as the result of previous transformations. An optimization
can be done by eliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, if statement is dead code because this condition will never get
satised.
10. The running time of a program may be improved if the number of instructions in an inner loop is decreased, even if the amount of code outside
that loop is increased.
Three techniques are important for loop optimization:
code motion, which moves code outside a loop;
Induction-variable elimination, which we apply to replace variables
from inner loop.
Reduction in strength, which replaces and expensive operation by a
cheaper one, such as a multiplication by an addition.
PART B
11. (a) (i) A Compiler operates in phases, each of which transforms the
source program from one representation into another. The following are the phases of the compiler:
Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
12/13/2012 5:14:31 PM
2.86
Sub-Phases:
1) Symbol table management
2) Error handling
Source Program
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Symbol table
management
Code Optimizer
Code Generator
Lexical analysis:
It is the rst phase of the compiler. It gets input from the source
program and produces tokens as output.
It reads the characters one by one, starting from left to right
and forms the tokens.
Token: It represents a logically cohesive sequence of characters such as keywords, operators, identiers, special symbols
etc.
Example: source code:a:=b+c*4
then in lexical analysis phase this statement is broken up into
series of token as follows
1. The identier a
2. The assignment symbol =
3. The identier b
12/13/2012 5:14:31 PM
2.87
b
c
Semantic analysis:
It is the third phase of the compiler.
It gets input from the syntax analysis as parse tree and checks
whether the given syntax is correct or not.
It performs type conversion of all the data types into real data
types.
Example : a:=b+c*4
:=
+
b
c
int to float
4
12/13/2012 5:14:31 PM
2.88
12/13/2012 5:14:32 PM
2.89
MOV b, R1
ADD R1,R0
MOV R0, a
Symbol table management:
Symbol table is used to store all the information about
identiers used in the program.
It is a data structure containing a record for each identier,
with elds for the attributes of the identier.
It allows to nd the record for each identier quickly and to
store or retrieve data from that record.
Whenever an identier is detected in any of the phases, it is
stored in the symbol table.
Error handling:
Each phase can encounter errors. After detecting an error,
a phase must handle the error so that compilation can
proceed.
In lexical analysis, errors occur in separation of tokens.
In syntax analysis, errors occur during construction of syntax
tree.
In semantic analysis, errors occur when the compiler detects
constructs with right syntactic structure but no meaning and
during type conversion.
In code optimization, errors occur when the result is affected
by the optimization.
12/13/2012 5:14:32 PM
2.90
a:=b+c*4
lexical analyzer
id1:= id2+ id3* 4
syntax analyzer
id1
:= temp3
:=
id1
code optimizer
+
id2
temp1 := id3* 60.0
*
id3
semantic analyzer
id1
:= id2+ temp1
code generator
:=
id1
+
id2
*
id3
inttoreal
MOVF
id3,
R2
MULF
#4.0, R2
MOVF id2,
R1
ADDF
R2,
R1
MOVF
R1,
d1
(ii) These are specialized tools that have been developed for helping
implement various phases of a compiler. The following are the
compiler construction tools:
Scanner Generator
Parser Generators
12/13/2012 5:14:32 PM
2.91
Syntax-Directed Translation
Automatic Code Generators
Data-Flow Engines
1) Scanner Generator:
These generate lexical analyzers, normally from a specication based on regular expressions.
The basic organization of lexical analyzers is based on nite
automation.
2) Parser Generators:
These produce syntax analyzers, normally from input that
is based on a context-free grammar.
It consumes a large fraction of the running time of a
compiler.
Example-YACC (Yet Another Compiler-Compiler).
3) Syntax-Directed Translation:
These produce routines that walk the parse tree and as a
result generate intermediate code.
Each translation is dened in terms of translations at its
neighbor nodes in the tree.
4) Automatic Code Generators:
It takes a collection of rules to translate intermediate language into machine language. The rules must include sufcient details to handle different possible access methods
for data.
5) Data-Flow Engines:
It does code optimization using data-ow analysis, that is,
the gathering of information about how values are transmitted from one part of a program to each other part.
(b) (i)
tokens
source
program
lexical
analyzer
parser
+
semantic analyzer
syntax
tree
symbol
table
manager
12/13/2012 5:14:32 PM
2.92
Lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which
performs lexical analysis is called a lexical analyzer or scanner.
A lexer often exists as a single function which is called by a
parser or another function.
The role of the lexical analyzer
The lexical analyzer is the rst phase of a compiler.
Its main task is to read the input characters and produce as
output a sequence of tokens that the parser uses for syntax
analysis.
Upon receiving a get next token command from the parser,
the lexical analyzer reads input characters until it can identify
the next token.
Issues of lexical analyzer
There are three issues in lexical analysis:
To make the design simpler.
To improve the efciency of the compiler.
To enhance the computer portability.
Tokens
A token is a string of characters, categorized according to the
rules as a symbol (e.g., IDENTIFIER, NUMBER, COMMA).
The process of forming tokens from an input stream of characters is called tokenization.
A token can look like anything that is useful for processing an
input text stream or text le. Consider this expression in the C
programming language: sum=3+2;
Lexeme Token type
Sum
Identier
Assignment operator
Number
Addition operator
Number
delimiter
Lexeme:
Collection or group of characters forming tokens is called
Lexeme.
12/13/2012 5:14:33 PM
2.93
Pattern:
A pattern is a description of the form that the lexemes of a token
may take. In the case of a keyword as a token, the pattern is just
the sequence of characters that form the keyword. For identiers
and some other tokens, the pattern is a more complex structure
that is matched by any strings.
Attributes for Tokens
Some tokens have attributes that can be passed back to the parser.
The lexical analyzer collects information about tokens into their
associated attributes. The attributes inuence the translation of
tokens.
i) Constant : value of the constant
ii) Identiers: pointer to the corresponding symbol table entry.
Error recovery strategies in lexical analysis:
The following are the error-recovery actions in lexical analysis:
1) Deleting an extraneous character.
2) Inserting a missing character.
3) Replacing an incorrect character by a correct character.
4) Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from
the token until error is resolved.
(b) (ii) The relational operators are <,>,<=,>=,=,!=
Start
S0
<
S1
=
S2
!
Other
S9
S3
=
=
S4
S10
other
S11
>
S6
S7
other
S5
Return(operator,EQ)
Return(operator,GE)
other
oth
Return(operator,NE)
a
S8
12/13/2012 5:14:33 PM
2.94
12. (a) (i) The different strategies that a parse uses to recover from a syntactic error are:
1. Panic mode
2. Phrase level
3. Error productions
4. Global correction
Panic mode recovery:
On discovering an error, the parser discards input symbols one
at a time until a synchronizing token is found. The synchronizing tokens are usually delimiters, such as semicolon or end. It
has the advantage of simplicity and does not go into an innite
loop. When multiple errors in the same statement are rare, this
method is quite useful.
Phrase level recovery:
On discovering an error, the parser performs local correction on
the remaining input that allows it to continue. Example: Insert a
missing semicolon or delete an extraneous semicolon etc.
Error productions:
The parser is constructed using augmented grammar with error
productions. If an error production is used by the parser, appropriate error diagnostics can be generated to indicate the erroneous constructs recognized by the input.
Global correction:
Given an incorrect input string x and grammar G, certain algorithms can be used to nd a parse tree for a string y, such that the
number of insertions, deletions and changes of tokens is as small
as possible. However, these methods are in general too costly in
terms of time and space.
(ii) The given grammar is :
G : E E + T ------ (1)
E T ---------------- (2)
T T * F ----------- (3)
T F --------------- (4)
F (E) ------------- (5)
F id --------------- (6)
Step 1 : Convert given grammar into augmented grammar.
12/13/2012 5:14:33 PM
2.95
Augmented grammar:
E E
EE+T
ET
TT*F
TF
F (E)
F id
Step 2 : Find LR (0) items.
I0 : E . E
E.E+T
E.T
T.T*F
T.F
F . (E)
F . id
GOTO ( I0 , E)
I1 : E E .
EE.+T
GOTO (I0 , T)
I2 : E T .
TT.*F
GOTO (I0 , F)
I3 : T F .
GOTO (I0 , ( )
I4 : F ( . E)
E.E+T
E.T
T.T*F
T.F
F . (E)
F . id
GOTO (I0 , id )
I5 : F id.
T.F
12/13/2012 5:14:33 PM
2.96
GOTO ( I1 , + )
I6 : E E + . T
T.T*F
T.F
F . (E)
F . id
GOTO ( I2 , * )
I7 : T T * . F
F . (E)
F . id
GOTO ( I4 , E )
I8 : F ( E . )
EE.+T
GOTO ( I6 , T )
I9 : E E + T
TT.*F
GOTO ( I7 , F )
I10 : T T * F .
GOTO ( I8 , ) )
I11 : F ( E ) .
FOLLOW(E) = {$}
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOLLOW (F) = { * , + , ) , $ }
SLR parsing table:
State Action
id
0
GOTO
+
S5
S4
S6
r2
S7
r2
r4
r4
r4
ACCEPT
12/13/2012 5:14:33 PM
S5
S4
r6
r6
2.97
r6
S5
S4
S5
S4
10
S6
r1
S7
r1
10
r3
r3
r3
11
r5
r5
r5
Act. Record
For factorial
Return value
Act.parameter
Dynamic link
12/13/2012 5:14:33 PM
2.98
main
Local
Factorial
Act. Record for factorial (3)
return value
Parameter
dynamic link
Factorial
return value
Parameter
dynamic link
12/13/2012 5:14:33 PM
2.99
Calling sequences:
Procedures called are implemented in what is called as calling
sequence, which consists of code that allocates an activation
record on the stack and enters information into its elds.
A return sequence is similar to code to restore the state of
machine so the calling procedure can continue its execution
after the call.
The code in calling sequence is often divided between the calling procedure (caller) and the procedure it calls (callee).
When designing calling sequences and the layout of activation
records, the following principles are helpful:
Values communicated between caller and callee are generally
placed at the beginning of the callees activation record, so they
are as close as possible to the callers activation record.
Fixed length items are generally placed in the middle. Such
items typically include the control link, the access link, and the
machine status elds.
Items whose size may not be known early enough are placed
at the end of the activation record. The most common example
is dynamically sized array, where the value of one of the callees
parameters determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common
approach is to have it point to the end of xed-length elds in
the activation record. Fixed-length data can then be accessed by
xed offsets, known to the intermediate-code generator, relative
to the top-of-stack pointer.
Parameters and returned values
callers
activation
record
callers
resp
ponsibility
callees
activation
record
callers
responsibility
control link
links and saved status
temporaries and local data
Parameters and returned values
control link
links and saved status
top_
p sp
s
temporaries and local data
12/13/2012 5:14:34 PM
2.100
The calling sequence and its division between caller and callee
are as follows.
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top_sp
into the callees activation record. The caller then increments
the top_sp to the respective positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
The callee places the return value next to the parameters.
Using the information in the machine-status eld, the callee
restores top_sp and other registers, and then branches to the
return address that the caller placed in the status eld.
Although top_sp has been decremented, the caller knows
where the return value is, relative to the current value of top_
sp; the caller therefore may use that value. Parameters and
returned values
Variable length data on stack:
The run-time memory management system must deal frequently
with the allocation of space for objects, the sizes of which are
not known at the compile time, but which are local to a procedure and thus may be allocated on the stack.
The reason to prefer placing objects on the stack is that we
avoid the expense of garbage collecting their space.
The same scheme works for objects of any type if they are
local to the procedure called and have a size that depends on the
parameters of the call.
Procedure p has three local arrays, whose sizes cannot be
determined at compile time.The storage for these arrays is not
part of the activation record for p.
Access to the data is through two pointers, top and top-sp.
Here the top marks the actual top of stack; it points the position
at which the next activation record will begin.
The second top-sp is used to nd local, xed-length elds of
the top activation record.
The code to reposition top and top-sp can be generated at compile time, in terms of sizes that will become known at run time.
12/13/2012 5:14:34 PM
2.101
control link
activation
n
re
ecord for p
pointer to A
pointer to B
pointer to C
array A
array B
arrays of p
array C
activation
ation reco
record for
procedure
ure q call
called by p
arrays of q
control link
top_sp
top
Heap Allocation:
Stack allocation strategy cannot be used if either of the following
is possible :
1. The values of local names must be retained when an
activation ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as
needed for activation records or other objects.
Pieces may be deallocated in any order, so over the time the
heap will consist of alternate areas that are free and in use.
The record for an activation of procedure r is retained when
the activation ends.
Therefore, the record for the new activation q(1 , 9) cannot
follow that for s physically.
If the retained activation record for r is deallocated, there will
be free space in the heap between the activation records for s
and q.
12/13/2012 5:14:34 PM
2.102
Position in the
activation tree
s
s
r
q (1,9)
Remarks
Retained activation
record for r
control link
r
control link
q(1,9)
control link
12/13/2012 5:14:34 PM
2.103
The elds arg1 and arg2, for the arguments of op, are either
pointers to the symbol table or pointers into the triple structure
(for temporary values).
Since three elds are used, this intermediate code format is
known as triples.
Indirect Triples:
Another implementation of three-address code is that of listing
pointers to triples, rather than listing the triples themselves. This
implementation is called indirect triples.
Example: a:=b*- c+b*- c
Three address code is
t1:=umius c
t2:=t1*b
t3:=umius c
t4:=t3*b
t5:=t2+t4
a:=t5
Quadraple:
op
Arg1
Arg2 Result
uminus
Uminus
t3
t4
t2
t4
t5
:=
t3
t1
t1
t2
t3
Triples:
op
Arg 1
Arg2
Uminus
Uminus
(2)
(1)
(3)
assign
(4)
(0)
12/13/2012 5:14:35 PM
2.104
Indirect triple:
statement
op
Arg1
Arg 2
(0)
(14)
(14)
Uminus
(1)
(15)
(15)
(2)
(16)
(16)
Uminus
(3)
(17)
(17)
(16)
(4)
(18)
(18)
(15)
(17)
(5)
(19)
(19)
assign
(18)
(14)
p : = lookup ( id.name);
if p nil then
emit( p : = E.place)
else error }
E E1 + E2
{ E.place : = newtemp;
emit( E.place : = E1.place + E2.place ) }
E E1 * E2
{ E.place : = newtemp;
emit( E.place : = E1.place * E2.place ) }
E E1
{ E.place : = newtemp;
emit ( E.place : = uminus E1.place ) }
E ( E1 )
{ E.place : = E1.place }
E id
{ p : = lookup ( id.name);
if p nil then
E.place : = p
else error }
12/13/2012 5:14:35 PM
2.105
(b) (i) Boolean expressions have two primary purposes. They are used
to compute logical values, but more often they are used as conditional expressions in statements that alter the ow of control,
such as if-then-else, or while-do statements.
Boolean expressions are composed of the boolean operators ( and,
or, and not ) applied to elements that are boolean variables or
relational expressions. Relational expressions are of the form E1
relop E2, where E1 and E2 are arithmetic expressions.
Here we consider boolean expressions generated by the
following grammar :
E E or E | E and E | not E | ( E ) | id relop id | true | false
Methods of Translating Boolean Expressions:
There are two principal methods of representing the value of a
boolean expression. They are :
To encode true and false numerically and to evaluate a boolean
expression analogously to an arithmetic expression. Often, 1
is used to denote true and 0 to denote false.
To implement boolean expressions by ow of control, that is,
representing the value of a boolean expression by a position
reached in a program. This method is particularly convenient
in implementing the boolean expressions in ow-of-control
statements, such as the if-then and while-do statements.
Numerical Representation
Here, 1 denotes true and 0 denotes false. Expressions will be
evaluated completely from left to right, in a manner similar to
arithmetic expressions.
For example :
The translation for a or b and not c is the three-address
sequence
t1 : = not c
t2 : = b and t1
t3 : = a or t2
A relational expression such as a < b is equivalent to the conditional statement if a < b then 1 else 0 which can be translated
into the three-address code sequence (again, we arbitrarily start
statement numbers at 100):
100 : if a < b goto 103
101 : t : = 0
102 : goto 104
12/13/2012 5:14:35 PM
2.106
103 : t : = 1
104 :
Translation scheme using a numerical representation for
booleans
E E1 or E2
{ E.place : = newtemp;
emit( E.place : = E1.place or E2.place )
E E1 and E2
{ E.place : = newtemp;
emit( E.place : = E1.place and E2.place ) }
E not E1
{ E.place : = newtemp;
emit( E.place : = not E1.place ) }
E ( E1 )
{ E.place : = newtemp;
emit( if id1.place relop.op id2.place goto
nextstat + 3);
emit( E.place : = 0 );
emit(goto nextstat +2);
emit( E.place : = 1) }
E true
{ E.place : = newtemp;
emit( E.place : = 1) }
{ E.place : = newtemp;
E false
emit( E.place : = 0) }
12/13/2012 5:14:35 PM
2.107
(2) E E1 and M E2
12/13/2012 5:14:35 PM
2.108
(3) E not E1
{ E.truelist : = E1.falselist;
E.falselist : = E1.truelist; }
(4) E ( E1 )
{ E.truelist : = E1.truelist;
E.falselist : = E1.falselist; }
(6) E true
{ E.truelist : = makelist(nextquad);
emit(goto_) }
(7) E false
{ E.falselist : = makelist(nextquad);
emit(goto_) }
(8) M
{ M.quad : = nextquad }
14. (a) (i) Write in detail about the issues in the design of a code generator.
The following issues arise during the code generation phase :
1. Input to code generator
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order
1. Input to code generator:
The input to the code generation consists of the intermediate representation of the source program produced by front
end , together with information in the symbol table to determine run-time addresses of the data objects denoted by the
names in the intermediate representation.
Intermediate representation can be :
a. Linear representation such as postx notation
b. Three address representation such as quadruples
c. Virtual machine representation such as stack machine
code
d. Graphical representations such as syntax trees and dags.
Prior to code generation, the front end must be scanned,
parsed and translated into intermediate representation along
with necessary type checking. Therefore, input to code generation is assumed to be error-free.
12/13/2012 5:14:35 PM
2.109
2. Target program:
The output of the code generator is the target program. The
output may be :
a. Absolute machine language
It can be placed in a xed memory location and can be
executed immediately.
b. Relocatable machine language
It allows subprograms to be compiled separately.
c. Assembly language
Code generation is made easier.
3. Memory management:
Names in the source program are mapped to addresses of
data objects in run-time memory by the front end and code
generator.
It makes use of symbol table, that is, a name in a three-address
statement refers to a symbol-table entry for the name.
Labels in three-address statements have to be converted to
addresses of instructions.
For example,
j : goto i generates jump instruction as follows :
if i < j, a backward jump instruction with target address
equal to location of
code for quadruple i is generated.
if i > j, the jump is forward. We must store on a list for
quadruple i the location of the rst machine instruction
generated for quadruple j. When i is processed, the machine
locations for all instructions that forward jumps to i are
lled.
4. Instruction selection:
The instructions of target machine should be complete and
uniform.
Instruction speeds and machine idioms are important factors when efciency of target program is considered.
The quality of the generated code is determined by its speed
and size.
5. Register allocation
Instructions involving register operands are shorter and
faster than those involving operands in memory.
12/13/2012 5:14:35 PM
2.110
12/13/2012 5:14:35 PM
2.111
12/13/2012 5:14:35 PM
2.112
12/13/2012 5:14:35 PM
2.113
Else
{
Generate (MOV operand2, RO);
If (operator =+)
Generate (ADD operand2, RO);
Else if (operator =- )
Generate (SUB operand2, RO);
Else if(operator =*)
Generate( MUL operand2, RO);
Else if (operator =/ )
Generate( DIV operand2, RO);
}
The algorithm takes as input a sequence of three-address statements constituting a basic block.
For each three-address statement of the form x : = y op z, perform the following actions:
1. Invoke a function getreg to determine the location L where
the result of the computation y opz should be stored.
2. Consult the address descriptor for y to determine y, the current location of y. Prefer the register for y if the value of y is
currently both in memory and a register. If the value of y is
not already in L, generate the instruction MOV y, L to place
a copy of y in L.
3. Generate the instruction OP z, L where z is a current location of z. Prefer a register to a memory location if z is in
both. Update the address descriptor of x to indicate that x is
in location L. If x is in L, update its descriptor and remove x
from all other descriptors.
4. If the current values of y or z have no next uses, are not live
on exit from the block, and are in registers, alter the register
descriptor to indicate that, after execution of x : = y op z ,
those registers will no longer contain y or z.
15. (a) (i) A transformation of a program is called local if it can be performed by looking only at the statements in a basic block; otherwise, it is called global.
Many transformations can be performed at both the local and
global levels. Local transformations are usually performed rst.
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a
program without changing the function it computes.
The transformations:
12/13/2012 5:14:35 PM
2.114
A=x*r*r;
12/13/2012 5:14:35 PM
2.115
12/13/2012 5:14:35 PM
2.116
12/13/2012 5:14:35 PM
2.117
12/13/2012 5:14:35 PM
2.118
Algebraic Simplication:
There is no end to the amount of algebraic simplication that
can be attempted through peephole optimization. Only a few
algebraic identities occur frequently enough that it is worth considering implementing them .For example, statements such as
x := x+0
Or
x := x * 1
Reduction in Strength:
Reduction in strength replaces expensive operations by
equivalent cheaper ones on the target machine. Certain machine
instructions are considerably cheaper than others and can often
be used as special cases of more expensive operators.
For example, x is invariably cheaper to implement as x*x
than as a call to an exponentiation
routine. X2 X*X
Use of Machine Idioms:
The target machine may have hardware instructions to implement certain specic operations efciently. For example, some
machines have auto-increment and auto-decrement addressing
modes. These add or subtract one from an operand before or
after using its value.
The use of these modes greatly improves the quality of code
when pushing or popping a stack, as in parameter passing. These
modes can also be used in code for statements like
i:=i+1.
i:=i+1 i++
i:=i- 1 i- (b) (i) One application of dominator information is in determining the
loops of a ow graph suitable
for improvement.
The properties of loops are
A loop must have a single entry point, called the header. This
entry point-dominates all nodes in the loop, or it would not be
the sole entry to the loop.
There must be at least one way to iterate the loop(i.e.)at least
one path back to the header.
One way to nd all the loops in a ow graph is to search for
edges in the ow graph whose heads dominate their tails. If ab
12/13/2012 5:14:35 PM
2.119
is an edge, b is the head and a is the tail. These types of edges are
called as back edges.
Example:
1
2
3
4
5
6
7
8
10
12/13/2012 5:14:35 PM
2.120
Step 2 and 3
Step 4
now if we assign value to
common subexpression then,
(12): = 4*k
(15): = a[(12)]
m: = 4*k
t1: = m
t2: = a[t1]
(12)
t5: = m
t6: = a[t5]
(15)
t5: = (12)
t6: = (15)
12/13/2012 5:14:36 PM
2.121
Copy propagation:
The assignment in the form a=b is called copy statement. The
idea behind the copy propagation transformation is to use b for
a whenever possible after copy statement a:=b .
Algorithm: Copy propagation.
Input: a ow graph G, with ud-chains giving the denitions
reaching block B
Output: A graph after Appling copy propagation transformation.
Method: For each copy s: x: =y do the following:
1. Determine those uses of x that are reached by this denition
of namely, s: x: =y.
2. Determine whether for every use of x found in (1), s is in
c_in[B], where B is the block of this particular use, and
moreover, no denitions of x or y occur prior to this use of x
within B. Recall that if s is in c_in[B]then s is the only denition of x that reaches B.
3. If s meets the conditions of (2), then remove s and replace all
uses of x found in (1) by y.
Step 1 and 2
x:= t3
this is a copy
y statement
a[t1]:= t2
use
a[t4]:= x
y:= x + 3
a[t5]: = y
use
since value of t3 and x is not altered along the path from is denition we will replace x by t3 and then eliminate the copy statement.
x:=t3
a[t1]: = t2
a[t4]: = t3 Eliminating
a[t1]: = t2
a[t4]: = t3
y: = t3 + 3 copy statement y: = t3 + 3
a[t5]: = y
a[t5]: = y
12/13/2012 5:14:36 PM
2.122
12/13/2012 5:14:36 PM
12/13/2012 5:14:36 PM
2.124
PART B (5 16 = 80 Marks)
11. (a) (i) Discuss about the input buffering scheme in lexical analyzer?
(ii) Construct a NFA using Thompsons construction algorithm for
the regular expression (a) (b)* abb (a| b) and convert it into DFA.
Or
(b) (i) Illustrate the compilers internal representation of the changes in
the source program, as translation progresses by considering the
translation of the statement A:=B + C * 50.
(ii) Construct a DFA directly from the regular expression (a| b)* abb
without constructing NFA?
12. (a) (i) Give the denition for FIRST(X) and FOLLOW(A) procedures
used in construction predictive parser?
(ii) What is an operator grammar? Draw the precedence graph for
the following table.
a
a
(
<
<
)
.
<
<
<
<
>
>
>
<
>
>
>
>
>
Or
(b) (i) Write a note on error recovery in predictive parsing?
(ii) Write the LR parsing algorithm. Check whether the grammar is
SLR(1) or not. Justify the answer with reasons.
S -> L = R| R;
L -> *R/id;
R-> L
13. (a) (i) What are the various data structures used for symbol table construction and explain any one in detail?
(ii) Let A be a 10 20 array with low 1 = low 2 = 1. Let w = 4.
Draw an annotated parse tree for the assignment statement X: =
A[y, z]. Give the sequence of three address statement generated.
Or
(b) How would you generated the intermediate code for the ow of control statements? Explain with examples?
12/13/2012 5:14:36 PM
2.125
12/13/2012 5:14:37 PM
Solutions
PART A
1. In language processing system, all preprocessed source programs to be
used as a source for generating an object program are scanned, checked
for errors and the optimized source programs are outputs in units of
translation.
2. The error recovery actions are
a. Deleting an extraneous character
b. Inserting a missing character
c. Replacing an incorrect character by a correct character
d. Transposing two adjacent characters
3. S->(L)/a
S->SL
L->,(SL)/
4. CLR Canonical LR
It is a grammar whose parsing table has no multiple dened entries.
A grammar for which an CLR parser can be constructed is said to be an
CLR grammar.
5. t1 = b+c
t2 = at1
6. The three functions are
a. makelist(i)
b. merge(p1, p2)
c. Backpatch(p, i)
7. Deducing at compile time that the value of an expression is a constant
and using that constant instead is known as Constant Folding.
8. Directed Acyclic Graph gives a picture of how the value computed by
each statement in a basic block is used in subsequent statement of the
block. Applications of DAG include
(i) To detect common sub expressions.
(ii) To determine which identiers have their values used in the block.
(iii) To determine which statements computed values are used outside
the block.
(iv) It can be used to reconstruct a simplied list of quadruples.
12/13/2012 5:14:37 PM
2.127
PART B
11. (a) (i) This technique explains the efciency issues concerned with
the buffering of input. For many source languages, there are
times when the lexical analyser needs to look ahead several
characters beyond the lexeme for a pattern before a match can
be announced. Since large amount of time can be consumed
moving characters, specialized buffering techniques have been
developed to reduce the amount of overhead required to process
an input character.
The principle of a buffering scheme is outlined as follows.
Let us consider a buffer divided into two N character halves.
: : : E :
:= : M : * C: * :
* : 2 : : eof
Lexeme beginning
Forward
12/13/2012 5:14:37 PM
2.128
a/b
abb
(a/b)*
(a/b)*abb(a|b)* and
2
0
1
6
100
1
1
12/13/2012 5:14:37 PM
2.129
e-closure(0) ={0,1,2,4,7}
.(A)
a-trans on A = {3,8}
b-tans on A = {5}
e-closure (3,8) = {0,1,2,3,4,6,7,8}
..(B)
a-trans on B = {3,8}
b-tans on B = {5,9}
e-closure (5) = {1,2,3,4,5,7}
..(C)
a-trans on C = {3,8}
b-tans on C = {5}
e-closure (5,9) = {1,2,3,4,5,6,7,9}
..(D)
a-trans on D = {3,8}
b-tans on D = {5}
e-closure (5,10) = {1,2,3,4,5,6,7,10,11,12,13,14,17}
..(E)
a-trans on E = {3,8}
b-tans on E = {5,10}
e-closure (3,8,13) = {1,2,3,4,6,7,8,11,12,13,14,15,16,17} ..(F)
a-trans on F = {3,8,13}
b-tans on F = {5,9,15}
e-closure (3,8) = {1,2,3,4,6,7,11,12,13,14,15,16,17} ....(G)
a-trans on G = {3,8,13}
b-tans on G = {5,15}
e-closure (3,8) = {1,2,3,4,6,7,9,11,12,13,14,15,16,17} ..(H)
a-trans on H = {3,8,13}
b-tans on H = {5,15}
The equivalent DFA transition table is
Input
States
12/13/2012 5:14:38 PM
2.130
A : = B + C50
Lexical analyzer
A, = , B, +, C, * 50
50
Id2
Syntax analyzer
Semantic analyzer
Intermediate code
generation
Code optimization
Code generation
MOVFid3, R1
MULF # 500, R1
MOVF id2, R2
ADDF R2, R1
MOVF R1, id1
11. (b) (ii) The syntax tree for (a/b)* abb# is given as
O
#
6
O
b
5
O
b
4
O
a
3
a
1
b
2
12/13/2012 5:14:38 PM
2.131
{1,2,3} o {5}
{5} b {5}
{1,2,3} o {4}
{4} b {4}
{1,2,3} o {3}
{1,2} {1,2}
{1,2}
{1} a {1}
{3} a {3}
{1,2}
{2} b {2}
followpos
1
2
3
4
5
6
{1,2,3}
{1,2,3}
{4}
{5}
{6}
b
Start
a
a
12/13/2012 5:14:39 PM
2.132
12. (a) (i) The construction of a predictive parser is aided by two functions FIRST & FOLLOW associated with a grammar.
If X is any string of grammar symbols, then FIRST (X) be
the set of terminals that begin the strings derived from X. If X
then is also in FIRST (X)
FOLLOW(A) is dened as for nonterminal A, to be the set
of terminals a that can appear immediately to the right of A
in some sentential form, i.e set of terminals a, such that there
exists a derivation of the form S A a for some and .
12. (a) (ii)
a
<
<
)
.
<
<
<
<
>
>
>
<
>
>
>
>
>
EAE|(E)|-E|id
+|-|*|/|
Is not an operator, because the right side EAE has three consecutive nonterminals. If we substitute for A, we obtain the
operator grammar as
E
E+E|E-E|E*E|E/E|E E| (E)|-E|id
12/13/2012 5:14:39 PM
fa
fc
f)
f,
fs
2.133
ga
g(
g)
g,
gs
12. (b) (i) An error is detected during predictive parsing when the terminal on top of the stack does not match the next symbol or when
nonterminal A is on top of the stack, a is the next input symbol
and the parsing table entry M[A, a] is empty.
Panic mode recovery is based on the idea of skipping on the
until a token in a selected set of Synchronizing tokens appears.
Its effectiveness depends on the choice of Synchronizing set.
The sets should be chosen so that the parser recovers quickly
from errors that are likely to occur.
Parse level recovery can be implemented in predictive parsen
by lling up the blank entries in the predictive parse table with
pointer to error handling routines. These routines can insert,
modify, or delete symbols in the input.
12. (b) (ii) LR-parsing algorithm.
INPUT: An input string w and an LR-parsing table with functions ACTION and GOTO for a grammar G.
OUTPUT: If w is in L(G), the reduction steps of a bottom-up
parse for w; otherwise, an error indication.
METHOD: Initially, the parser has so on its stack, where so
is the initial state, and w$ in the input buffer. The parser then
executes the program.
12/13/2012 5:14:39 PM
2.134
S.R
L .* R
L . id
R.L
12/13/2012 5:14:39 PM
2.135
L .id
I5: L id.
I6: S L = .R
R .L
L .* R
L .id
I7: L *R.
I1: S S.
I2: S L. = R
R L.
I3: S R.
I4: L *.R
R .L
L .*R
I8: RL.
I9: S L = R.
The SLR Parsing Table is
State
Action
=
Id
S4
S5
1
2
S6, r5
R5
R2
S4
S5
R4
7
R4
S4
S5
R3
R3
R5
R5
Accept
3
5
GOTO
R1
12/13/2012 5:14:39 PM
2.136
13. (a) (i) A data structure called a Symbol table is generally used to store
information about various source language constructs. The
information is collected by the analysis phases of the compiler
and used by the synthesis phases to generate the target code.
The data structure for a particular implementation of a symbol table is shown as
Lexptr
Attributes
Token
Div
Mod
Id
I
Eos M
Eos
Eos
Array Lexemes
A xed amount of space may not be large enough to hold
a very long identier and may be wastefully large for a short
identier.
In the above gure, a separate array lexemes hold the character string forming an identier. The string is terminated by
an end of string (EOS), that may not appear in identiers. Each
entry in the symbol table array symtable is a record consisting
of two elds.
(1) Lexptr pointing to the beginning of lexeme
(2) Token
In the above representation, 0th entry is left empty, because
lookup return 0 to indicate that there is no entry for a string.
The 1st and 2nd entries are for keywords div and mod. The
3rd and 4th entries are for identier count and i.
12/13/2012 5:14:39 PM
2.137
13. (a) (ii) The annotated parse tree for the assignment statement X =
A[y, z] is given as
A
L.Place = x
L.offset = null
E.place = t4
L.Place = t2
L.offset = t3
Elist.place = t1
Elist.ndim = 2
Elist.array = A
Elist.place = t1
Elist.ndim = 1
Elist.array = A
E.place = y
E.place = z
L.Place = z
L.offset = null
L.Place = y
L.offset = null
12/13/2012 5:14:40 PM
2.138
t3 = 4 * t1
t4 = t2 [t1]
x = t4
13. (b)
12/13/2012 5:14:40 PM
t6, prod
<=
+
[]
t2
20
t7, i
t4
[]
t5
prod
2.139
t1, t3
+A
D,B
A compiler optimization must preserve the semantics of the original program. A transformation of a program is local if it can be performed by looking at the statements in the basic block, otherwise
it is called global. Many transformations can be performed at both
local and global levels.
(1) Function Preserving Transformations
There are a number of ways in which a compiler can improve a
program without changing the function it computes. Common
12/13/2012 5:14:40 PM
2.140
sub expression elimination, copy propagation, dead-code elimination, and constant folding are common examples of functionpreserving transformations.
a. Common Sub expression Elimination
An occurrence of an expression E is called a common subexpression if E was previously computed, and the values of
variables in E have not changes since the previous computation. We can avoid recomputing the expression if we can use
the previously computed value.
For example, consider the following block of statements
t1 = 4 * i
t2 = a[t1]
t3 = 4 * j
t4 = 4 * i
t5 = n
t6 = b[t4] + t5
The above code after optimization using common sub
expression elimination is
t1 = 4 * i
t2 = a[t1]
t3 = 4 * j
t5 = n
t6 = b[t4] + t5
b. Copy Propagation
Using one variable instead of another, is called as copy
propagation. The idea behind the copy propagation transformation is to use g for t, wherever possible after the copy
statement f = g. for example, consider the following block of
statements.
x = t3
a[t6] = t5
a[t4] = x
copy propagation yields
x = t3
a[t6] = t5
a[t4] = t5
12/13/2012 5:14:41 PM
2.141
12/13/2012 5:14:41 PM
2.142
while(i<=t)
{
sum = sum + a[i]
}
b. Induction variable elimination
Consider the block of code
j = j1
t4 = 4 * j
t5 = a[t4]
if t5 > v i =10
In the above the values of j and t4 remain in a lock step. Every time the value j decreases by 1, t4 decreases by 4 because
4 *j is assigned to j. such identiers are called induction
variables.
When there are two or more induction variables in a loop,
it may be possible to get rid of all but one, by the process of
induction variable elimination.
c. Reduction in strength
Reduction in strength replaces expensive operation by
equivalent cheaper ones on the target machine. For example
X2 is invariably cheaper to implement X * X than as a call to
an exponentiation routine.
12/13/2012 5:14:41 PM
12/13/2012 5:14:41 PM
2.144
PART B (5 16 = 80 Marks)
11. (a) (i) Explain the phases of compiler, with the neat schematic.
(ii) Write short notes on compiler construction tools?
Or
(b) (i) Explain grouping of phases?
(ii) Explain specication of tokens?
12. (a) Find the SLR parsing table for the given grammar and parse the
sentence (a + b)*c E -> E/ E*E/ (E)/id.
Or
(b) Find the predictive parser for the given grammer and parse the
sentence (a + b)*c E-> E+T/T, T-> T*F/F, F-> (E)/id.
13. (a) Generate intermediate code for the following code segment along
with the required syntax directed translation scheme
(i) if (a > b) x = a + b
else
x=ab
where a & x are of real and b of int type data
(ii) int a,b;
oat c;
a= 10;
Switch (a)
{case 10: c = 1;
Case 20 : c = 2;
}
Or
(b) (i) Generate intermediate code for the following code segment along
with the required syntax directed translation scheme:
i = 1; s = 0;
while (i<= 10)
s = s + a[i] [i] [i]
i = i+1
(ii) Write short notes on back- patching?
12/13/2012 5:14:41 PM
2.145
14. (a) (i) Explain the various issues in the design of code generation?
(ii) Explain code generation phase with simple code generation
algorithm?
Or
(b) (i) Generate DAG representation of the following code and list out
the applications of DAG representation:
i = 1; s = 0;
while (i< = 10)
s = s + a[i] [i]
i = i+1
(ii) Write short notes on next-use information with suitable
example?
15. (a) (i) Explain principle sources of optimization?
(ii) Write short notes on; Storage organization, Parameter Passing.
Or
(b) (i) Optimize the following code using various optimization
technique:
i = 1; s = 0;
for(i= 1; i< = 3;i++)
for ( j = 1; j < = 3;j+ +)
c[i][j] = c[i][j] + a[i][j] + b[i][ j]
(ii) Write short notes on access to non-local names?
12/13/2012 5:14:41 PM
Solutions
PART A
1.
Compiler
Interpreter
id
+
E
E
id
id
4. E->TE
E-> + TE /
T->FT
T->*FT/
F->(E)
F->id
5. Examination of the entire program to suggest optimization is called
global data ow analysis. In data ow analysis, the analysis is made on
the data ow. Data ow analysis determines the information regarding
the denition and use of the data in the program.
6. Refer Nov/Dec 2009 - Q. No. 5.
7. Left p1 = add
t1 = a + b
c = t1
Return c
12/13/2012 5:14:41 PM
2.147
PART B
11. (a) (i) Refer Nov/Dec 2009 - 11(a) (i).
11. (a) (ii) Refer Nov/Dec 2009 - 11(a) (ii).
11. (b) (i) The phases of compiler can be grouped together to form front
end and back end.
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Front end
Code Optimizer
Code Generator
Back end
12/13/2012 5:14:41 PM
2.148
Meaning
Postx of S
Sufx of S
12/13/2012 5:14:41 PM
Operation
Dention
Union of L & M
L U M = { S/S is in L or S is
in M}
Concatenation of L & M
(LM)
LM = {St/S is in L & t is in M}
L* =
12. (a)
2.149
Li
i=0
Solution
E->E+E
E->E*E
E->(E)
E->id
Step 1:
The canonical collection of a set of item with augmented grammar
is
I 0:
E E
E E + E
E E * E
E (E)
E id
E E.
I1:
E E + E
E E * E
I2:
E (E)
E E + E
E E * E
E (E)
E id
I3:
E id
12/13/2012 5:14:41 PM
2.150
E E + E
E E + E
E E * E
E (E)
E id
E E * E
E E + E
E E * E
E (E)
E id
E (E)
E E + E
E E * E
E E+ E
E E. + E
E E * E
E E *E
E E + E
EE*E
E (E)
I4:
I5:
I6:
I7:
I8:
I9:
Step 2:
0
1
2
3
4
5
6
7
8
9
id
s3
s4
s5
r4
r4
s3
(
s2
GOTO
$
acc
s2
s3
s3
6
r4
r4
s2
s2
s4
r1
r2
r3
s5
s5
r2
r3
E
1
7
8
s9
r1
r2
r3
r1
r2
r3
12/13/2012 5:14:41 PM
2.151
Step 3:
Parsing of the input string (a + b)*c
Stack
Input
Action
0
0(2
0(2a
0(2E1
0(2E1 +
0(2E1 + 4
0(2E1 + 4b
0(2E1 + 4E
0(2E1 + 4E)
0(2E)
0E1
0E1*5
0E1*5c
0E1*E8
0E
(a + b)*c $
a + b)*c $
+ b)*c $
+ b)*c $
b)*c $
b)*c $
)*c $
)*c $
*c $
*c $
*c $
c$
$
$
$
Shift
Shift
Reduce by E->id
Shift
Shift
Shift
Reduce by E->id
Shift
Reduce by E->E+E
Reduce by E->(E)
Shift
Shift
Reduce by E->id
Reduce by E->E * E
12. (b)
13. (a)
(i)
Solution:
100
if a < b goto 104
101
t1 = a b
102
x = t1
103
goto 106
104
t1 = a + b
105
x = t2
106
12/13/2012 5:14:42 PM
2.152
(4) Nextquad
12/13/2012 5:14:42 PM
2.153
DAG representation
=
<=
S
S
10
[][]
Applications of DAG
(1) Common sub expressions can automatically be detected
(2) Identifers which have their values used in their block can be
determined.
(3) The statements which compute values that could be used
outside the block can be determined.
(4) Bayesian networks.
14. (b) (ii) If the name in a register is no longer needed, then the register can be assigned to solve other name. The idea of keeping
a name in storage only if it will be used subsequently can be
applied in a number of contexts.
The use of name in a three address statement is dened as
follows. Suppose three address statement i assigns a value to x.
if statement j has x an operand and control ow from i to j along
the path that has no intervening assignments to x, then we say
statement j uses the value of x computed at i.
The algorithm to determine next uses makes a backward pass
over each basic block. Having found the end of the basic block,
we scan backwards to the beginning, recording for each name
x, whether x has a next use in the block, and if not, whether it
is live on exit from the block. If the data ow analysis has been
done, we know which names are live on exit from each block.
If no live variable analysis has been done, it is assumed that
all non-temporary variables are live on exit. If the algorithms
generating intermediate code or optimizing the code permit
certain temporaries to be used across blocks, these too must be
considered live.
12/13/2012 5:14:42 PM
2.154
Heap
12/13/2012 5:14:42 PM
2.155
b) Activation Record
Information needed by a single execution of a procedure is
managed using a contiguous block of storage called activation record or frame. The activation record of a procedure is
pushed on runtime stack when a procedure is called and pop
off activation record when control returns to the callee. The
elds in the activation record are shown as
Returned Value
Actual Parameters
Control Link (optional)
Access Link (optional)
Saved Machine Status
Local Variables
Temporaries
The purpose of elds of an activation record is as follows
(1) Temporary values such as those arising in evaluation of
expressions are stored in the eld for temporaries.
(2) The eld for local data holds data that is local to an
execution of a procedure.
(3) Saved machine status holds information about the state
of machine just before the procedure is called.
(4) Optional Access link refers to the non-local data in
other activation record.
(5) Optional Control link points to the activation record of
the callee.
(6) Actual parameters is used by the calling procedure to
supply parameters to the called procedure.
(7) Returned value is used to store the result of function
call.
The size of each of these elds can be determined at the time
a procedure is called.
c) Compile time layout of local data
The amount of storage needed for a name is determined from
its type. An elementary data type such as character, integer
or real can usually be stored in an integral number of bytes.
Storage for an aggregate such as an array or record must be
12/13/2012 5:14:42 PM
2.156
Call by value
Call by reference
Copy restore
Call by name
a. Call by value
It is the simplest method of passing parameter. The actual
parameters are evaluated and their r-values are passed to
the called procedure. Call by value can be implemented as
follows
i) A formal parameter is treated like a local name, so the
storage for formals is in the activation record of the
called procedure.
ii) The caller evaluates the actual parameters and places
their r-values in the storage for formals.
In call by value, the operations on formal parameters do not
affect values in the activation record of the caller.
b. Call by Reference
When parameters are passed by reference, the caller passes
to the called procedure, a pointer to the storage address of
each actual parameter.
i) If the actual parameter is a name or an expression having an l-value, then that l-value itself is passed.
ii) If the actual parameter is an expression, then the expression is evaluated in a new location and the address of
that location is passed.
12/13/2012 5:14:42 PM
2.157
c. Copy Restore
A hybrid between Call by value and Call by Reference is copy
restore. It is also known as copy in copy out or value result.
i) The calling procedure calculates the value of actual
parameter and it is then copied to the activation record
for the called procedure.
ii) During execution of called procedure, the actual
parameter value is not affected.
iii) If the actual parameter has l-value, then at return
the value of formal parameter is copied to actual
parameter.
d. Call by name
Call by name is traditionally dened by the copy rule, which
is
i) The procedure is treated like macro. The procedure
body is substituted for call in.
ii) The actual parameters are surrounded by parenthesis to
preserve their integrity.
iii) The local names of called procedures are kept distinct
from the names of the calling procedure.
15. (b) (i) 3-address code:
i=1
s=0
t1 = 4*i
t2 = 4*j
t3 = add(a)-4
t4 = t3[t1][t2]
t5 = add(b)-4
t6 = t5[t1][t2]
t7 = t4 + t6
t8 = add(c) -4
t9 = t8[t1][t2]
t10 = t 9 + t7
i=i+1
j=j+1
if(i < 3) goto B2
if( j < 3) goto B2
Un optimized code:
i=1
s=0
B1
12/13/2012 5:14:42 PM
2.158
t1 = 4*i
t2 = 4*j
t3 = add(a)-4
t4 = t3[t1][t2]
t5 = add(b)-4
t6 = t5[t1][t2]
t7 = t4 + t6
t8 = add(c)-4
t9 = t8[t1][t2]
t10 = t9 + t7
i=i+1
j=j+1
if(i < 3) goto B2
if( j < 3) goto B2
B2
B1
t3 = add(a)-4
t5 = add(b)-4
t8 = add(c) -4
B1
t1 = 4*i
t2 = 4*j
t4 = t3[t1][t2]
t6 = t5[t1][t2]
t7 = t4 + t6
t9 = t8[t1][t2]
t10 = t9 + t7
i=i+1
j=j+1
if(i < 3) goto B2
if( j < 3) goto B2
12/13/2012 5:14:42 PM
2.159
15. (b) (i) For languages that do not allow nested procedure declarations,
allocation of storage for variables and access to those variables
is simple
1. Global variables are allocated static storage. The locations
of these variables remain xed and are known at compile
time. So to access any variable that is not local to the currently executing procedure, we simply use the statically
determined address.
2. Any other name must be local to the activation at the top of
the stack. We may access these variables through the top-sp
pointer of the stack.
An important benet of static allocation for global is that
declared procedures may be passed as parameters or returned
as results, with no substantial change in the data-access strategy. With the C static-scoping rule, and without nested procedures, any name nonlocal to one procedure is nonlocal to all
procedures, regardless of how they are activated. Similarly, if a
procedure is returned as a result, then any nonlocal name refers
to the storage statically allocated for it.
The scope of a declaration statement in a block structured
long is given by the most closely nested rule.
1. The scope of a declaration in a block B includes B
2. If the name X is not declared in a block B, then an occurrence of X in B is in the scope of a declaration of X in an
enclosing block B1 such that
a. B1 is a declaration of X and
b. B1 is more closely nested around B than any other block
with a declaration of X.
12/13/2012 5:14:42 PM
12/13/2012 5:40:25 PM
2.161
PART B (5 16 = 80 Marks)
11. (a) (i) What are the various phases of compiler? Explain each phase in
detail. Write down the output of each phase for the expression
a = b + c 60
(ii) Briey explain compiler construction tools?
Or
(b) Prove that the following two regular expressions are equivalent
by showing that the minimum state DFAs are same. (i) (a/b)*
(ii) (a*/b*)*
12. (a) (i) Write down the necessary algorithm for nding FIRST and
FOLLOW
(ii) Give the algorithm for constructing SLR parsing table.
Or
(b) Show that the following grammer is LARL but not in SLR.
S-> L = R|R, L -> *R | id, R -> L
13. (a) what is three addresses code? What are its types? How is it
implemented?
Or
(b) How would you generate the intermediate code for the ow of control statements? Explain with Examples?
14. (a) Discuss the runtime storage management of a code generator?
Or
(b)
(i) Generate code for the following statements for target machine
(1) x = x + 1
(2) x = a + b + c
(3) x = a1/(bc)d*(e + f)
(ii) Explain the transformation of basic blocks?
12/13/2012 5:14:42 PM
2.162
12/13/2012 5:14:42 PM
Solutions
PART A
1. Compiler is a program that read a program written in one language (high
level language) and translates it into an equivalent program in another
language.(Machine language)
Target program
Source program
Compiler
Error message
12/13/2012 5:14:42 PM
2.164
8.
uminus
PART B
11. (a) (i) Compiler is a program that reads a program written in one language and translates it into an equivalent program in another
language. Compiler operates in phases, each of which transforms the source program from one representation to another.
A typical decomposition of a compiler is shown as
12/13/2012 5:14:43 PM
2.165
Source program
Lexical analyzer
Syntax analyzer
Semantic analyzer
Symbol table
management
Error handler
Intermediate code
generator
Code optimizer
Code generator
Target program
12/13/2012 5:14:43 PM
2.166
b
c
60
12/13/2012 5:14:43 PM
2.167
12/13/2012 5:14:43 PM
2.168
11. (a) (ii) Writing a compiler is difcult and time consuming task. There
are some specialized tools that can be used in the implementation of various phases of compiler. These tools are often
referred as compiler compilers, compiler generators or translator writing systems. Some of the useful compiler construction
tools are
a. Parser Generator
b. Scanner Generator
c. Syntax Directed Translation Engine
d. Automatic Code Generator
e. Data Flow Engines
a. Parser Generator
These produce syntax analyzers. Here the input is given in the
form of context free grammars. Many parser generators utilize
powerful parsing algorithms that are too complex to be carried
out by hand. UNIX has a parser generator tool called YACC.
b. Scanner Generator
These automatically generate lexical analyzers, normally form
a specication based on regular expressions. The basic organization of the resulting lexical analyzer is in effect a nite
automation.
c. Syntax Directed Translation Engines
Using this tool, the intermediate code is generated by scanning
completely the parse tree. The translation is done for each mode
of the tree and each translation is dened in terms of translations at its neighbor nodes in the tree.
d. Automatic Code Generator
This tool takes a collection of rules that dene the translation of
each operation of the intermediate languages into the machine
language for the target machine. Template matching technique
is used. The intermediate code statements are replaced by templates that represent sequences of machine instructions.
e. Data ow Engines
Data ow analysis is required to perform good code optimization. Data ow analysis involves gathering of information
about how values are transmitted from one part of a program
to each other part.
12/13/2012 5:14:43 PM
11. (b)
2.169
12/13/2012 5:14:43 PM
2.170
a = *ip
case action[s,a] of {
SHIFT s: { push(a); push(s) }
REDUCE A->beta: {
pop 2*|beta| symbols; s = new state on top
push A
push goto(s, A)
}
ACCEPT: return 0 /* success */
ERROR: { error(syntax error, s, a); halt }
}
}
Constructing an SLR Parsing Table
Given a grammar G, construct the augmented grammar by
adding the production S' -> S. Construct C = {I0, I1, In}, the
set of sets of LR(0) items for G'.
State I is constructed from Ii, with parsing action determined
as follows:
Step 1: If [A -> .aB] is in Ii, where a a terminal; goto(Ii,a) =
Ij : then set action[i,a] = shift j
Step 2: [A -> .] is in Ii, then set action[i,a] to reduce A -> x
for all a in FOLLOW(A), where A != S'
Step 3: [S' -> S] is in Ii : set action[i,$] to accept
goto transitions constructed as follows:
for all non-terminals: if goto(Ii, A) = Ij, then goto[i,A] = j
All entries not dened by (3) & (4) are made error. If there
are any multiply dened entries, grammar is not SLR.
Initial state S0 of parser: that constructed from I0 or [S -> S]
12. (b) S-> L = R|R,
Augmented grammar is
S| S
S L =R
SR
L *R
L id
RL
12/13/2012 5:14:44 PM
2.171
12/13/2012 5:14:44 PM
2.172
I13:
goto(11,R)
L*R.,$
action
States
id
S4
S5
1
2
S6
10
13
R5
R2
S4
S5
S11
S12
R4
R4
R5
R5
R3
R3
9
10
$
Accept
3
5
goto
R5
R1
11
R1
S11
S12
12
R4
13
R3
12/13/2012 5:14:44 PM
13. (a)
2.173
Op
Arg1
Arg2
Result
(0)
Uminus
(1)
(2)
Uminus
(3)
t3
t4
(4)
t2
t4
t5
(5)
;=
t5
t1
t1
t2
t3
Triples
To avoid entering temporary name into the symbol table, we may
refer to a temporary value by the position of the statement that computes it. In this case, three address statements can be represented
by records with only three elds namely op, arg1, arg2. The elds
areg1 and arg2 are either pointers to the symbol table or pointers
into the triple structure. It refers to the symbol table for user dened
names or constant and to the triple structure for temporary value.
12/13/2012 5:14:44 PM
2.174
Op
Arg1
Arg2
(0)
Uminus
(1)
(2)
Uminus
(3)
(2)
(4)
(1)
(3)
(5)
Assign
(4)
(0)
Indirect Triples
It has been considered as that of listing pointers to triples, rather
than listing the triples themselves. The above three address code is
represented as
S.No
Statement
(0)
(14)
(1)
(15)
(2)
(16)
(3)
(17)
(4)
(18)
(5)
(19)
S.No
Op
Arg1
(14)
Uminus
(15)
(16)
Uminus
(17)
(2)
(18)
(1)
(3)
(19)
Assign
(4)
Arg2
(0)
12/13/2012 5:14:44 PM
13. (b)
1.
2.
3.
14. (a)
2.175
Production
Semantic Rules
S -> If E then S1
{E.true := newlabel;
E.false := S.next;
S1.next := S.next;
S.Code := E.Code || gen(E.true : ) || S1.Code}
S -> if E then S1
else S2
{E.true := newlabel;
E.false := newlable;
S1.next := S.next;
S2.next := S.next;
S.Code := E.Code || gen(E.true : ) || S1.Code
gen(goto S.next) || gen(E.false : ) || S2.Code
}
S -> while E do S1
{ S.begin ;= newlabel;
E.true := newlabel;
E.false := S.next;
S1.next := S.begin;
S.Code := gen(S.begin ; || E.code || gen(E.true
: ) || S1.code || gen(gotoS.begin) }
12/13/2012 5:14:44 PM
2.176
12/13/2012 5:14:44 PM
2.177
12/13/2012 5:14:44 PM
2.178
12/13/2012 5:14:44 PM
2.179
12/13/2012 5:14:44 PM
2.180
d4: i: = i + 1
d5: j: = j 1
B2
B3
B4
d6: = a: = u2
B6
12/13/2012 5:14:44 PM
2.181
t1: = 4 * i
t2: = addr(A)-4
t3: = t2[t1]
t4: = addr(B)-4
t5: = t4[t1]
t6: = t3 * t5
sum: = sum + t6
i: = i + 1
if i <= n goto B2
B1
B2
Sum:=0
i:=1
if i>n goto 15
t1= addr(a)-4
t2=i*4
t3= t1[t2]
t4=addr(a)-4
t5=i*4
12/13/2012 5:14:45 PM
2.182
9.
10.
11.
12.
13.
14.
15.
t6=t4[t5]
t7-t3*t6
t8=sum+t7
sum=t8
i=i+1
goto 3
...
9. t6=t4[t5]
10. t7=t3*t6
10a t7=t3*t3
11. sum=sum+t7
12. sum=t8
13. i = i + 1
14. goto 3
Sum: = 0
i: = 1
B1
t2: = addr(A)-4
t4: = addr(B)-4
B3
t1: = 4 * i
t3: = t2[t1]
t5: = t4[t1]
t6: = t3 * t5
sum: = sum + t6
i: = i + 1
if i <= n goto B2
B2
1. Sum:=0
2. i:=1
2a. t1=addr(a)-4
2b t2=i*4
3. if i>n goto 15
5. t2=i*4
6. t3=t1[t2]
10a. t7=t3*t3
11a. sum=sum+t7
11b. t2=t2+4
13. i=i+1
14. goto 3
15. ...
12/13/2012 5:14:45 PM
12/13/2012 5:14:45 PM
2.184
PART B (5 16 = 80 Marks)
11. (a)
(i) Explain the need for dividing the compilation process into
various phases and explain its functions.
(ii) Explain how abstract stack machine can be used as
translators?
Or
12/13/2012 5:14:45 PM
Solutions
PART A
1. The issues in the design of Lexical analysis are
a. Simple Design
b. Compiler Efciency is improved
c. Compiler portability is enhanced.
2.
A
A
B
e
12/13/2012 5:14:45 PM
2.186
example a language may allow the declaration of data items anywhere in the program. It may not be necessary for the declaration to
precede the rst use of the data item.
(ii) Sufcient core memory may not be available to accommodate a
single pass compiler.
(iii) A multipass structure may be required to satisfy the primary aims of
the compiler to generate a highly efcient target code or to occupy
minimum possible storage space.
6. Refer Nov/Dec 2009 - Q. No. 5.
7. The issues are
a. Input to the code generator
b. Target program
c. Memory management
d. Instruction selection
e. Register allocation
f. Choice of evaluation order
8. The rst statement in a basic block is a leader.
Any statement which is the target of a conditional or unconditional
goto is a leader.
Any statement which immediately follows a conditional goto is a
leader.
9. The size of the data object and constructs on its position in memory
must be known at compile time.
Recursive procedures are restricted.
Data structure cannot be created dynamically.
10. A hybrid between call by value and call by reference is call restore linkage. It is also called as Copy in Copy out or Value result.
PART B
11. (a) (i) Refer Nov/Dec 2009 - 11(a)(i).
11. (a) (ii) The abstract machine code for an expression simulates a stack
evaluation of the postx representation for the expression.
Expression evaluation proceeds by processing the postx representation from left to right.
12/13/2012 5:14:46 PM
2.187
Evaluation
(1) Pushing each operand onto the stack when encountered.
(2) Evaluating k-array operator by using the value located k-1
positions below the top of the stack as the left most operand an so on, till the value on top of the stack is used as the
right most operand.
(3) After the evaluation, all k operands are popped from the
stack, and the result is pushed onto the stack.
Example
Stmt -> ID = expr{ stmt.t = expr.t || istore a }
Applied to a = 3 bc
bipush 3
iload b
imul
iload c
isub
istore a
Java Virtual Machine
Similar to the abstract stack machine, the java virtual machine
is an abstract processor architecture that denes the behavior
of Java Byte code programs. The stack in JVM is referred to
as operand stack or value stack. Operands are fetched from the
stack and the result is pushed back on to the stack.
11. (b)
Associating the attributes with the grammar symbols is called translation. When we associate semantic rules with productions, we use
two notations.
(1) Syntax Directed denitions
(2) Translation Schemes
(1) Syntax Directed Denitions
It gives high level specications for translations. It hides many
implementation details such as order of evaluation of semantic
actions. We associate a production rule with a set of semantic
actions, and we do not say when they will be evaluated.
(2) Translation Schemes
It indicates the order of evaluation of semantic actions associated with a production rule. In other words, translation schemes
give a little bit information about implementation details.
12/13/2012 5:14:46 PM
2.188
Attributes
(i) Place: It refers to the location to store the value for that
symbol.
(ii) Code: It refers to the expression or combination of expressions
in the form of three address code.
(iii) Value: It refers to the value of a symbol.
(iv) New Temp: It returns a sequence of distinct names t1, t2, in
response to successive calls.
(v) Gen: It is used for evaluating expression.
The syntax directed denition for associating a type to an
Expression is
Production
E-> literal
E->num
E->id
E->E1 mod E2
E-> E1 [E2]
E->E1
12. (a)
Semantic Rule
E.type := char
E.type : = int
E.type := lookup(id.entry)
E.type := if E1.type = int and E2.type := int
Then int else type error
E.type:= if E2.type = int and E1.type = array (s,t)
Then t
Else type error
E.type := if E1.type = pointer (t) then t
Else type error
12/13/2012 5:14:46 PM
2.189
A -> a
Step 2: Canonical Collection of LR(0) items
I0:
S -> .S
S->.AS
S->.b,
A->.SA
A ->.a
I 1:
I2:
I 3:
I 4:
I 5:
I 6:
I 7:
12/13/2012 5:14:46 PM
2.190
Parsing Table
States
Action
a
S3
S4
S3
S4
S3
S4
r4
r4
r2
r2
S3
S4
S3, r3
S4, r3
S3, r1
S4, r1
GOTO
$
Accept
r2
r1
12/13/2012 5:14:46 PM
2.191
E->
E->
T->
T->
E->TE
E->+TE
T->FT
T->FT
T->
T
F
E->TE
E
T
T->*FT
F->id
F->(E)
Input String
Action
$E
$ET
$ETF
$ETid
$ET
$E
$ET
$ET
$ETF
$ETid
$ET
$ETF*
$ETid
$ET
$E
$
id + id*id$
id + id*id$
id + id*id$
id + id*id$
+ id*id$
+ id*id$
id*id$
id*id$
id*id$
id*id$
*id$
*id$
id$
$
$
$
Push E->TE
Push T->FT
Push F->id
Pop id
Push T->
Push E->+TE
Pop +
Push T->FT
Push F->id
Pop id
Push T->*FT
Pop *
Push F->id
Push T->
Push E->
Success
13. (a)
13. (b)
As the sequence of declarations in a procedure or block is examined, we can layout storage for names local to the procedure. For
each local name, we create a symbol table entry information like
the type and the relative address of the storage for the name. the
relative address consists of an offset from the base of the static data
area or the eld for local data in activation record.
The syntax of language such as C, Pascal and FORTRAN allows
all the declarations in a single procedure to be processed as a group.
12/13/2012 5:14:46 PM
2.192
In this case a global variable say offset, can keep track of the next
available relative address.
For example, in the translation scheme, non-terminal P generates
a sequence of declarations of the form id:T. Before the rst declaration is considered offset is set to 0.
The procedure enter(name, type, offset) creates a symbol table
entry for name gives it type and relative address offset in its data
area. We use synthesized attribute type and width for non-Terminal
T to indicate the type and width or number of memory units taken
by objects of that type. Synthesized translation is one where the
translation depends on the translation of the children.
The types and relative address of declared names is given as
SI.No Production
1.
PD
2.
DD;D
D id : T
3.
T integer
4.
5.
6.
Semantic rule
{Offset: = 0}
{enter (id. name, T. type, offset);
Offset: = offset + T. width}
{T. type: = Integer;
T. Width: = 4}
T real
{T.type: = real;
T.width: = 8}
T array [num] of T1 {T.type: = array (num.val,T1 type);
T.width: = num.val XT1.width }
{T.type : = pointer (T1. type);
T T1
T. Width: = 4}
Semantic rule
E literal
E num
E id
E E1 mod E2
E.type:= char
E.type:= int
E.type:= lookup (id.entry)
E.type:= if E1.type = int and E2.type = int
Then int
else type error
E.type:= if E2.type = int and E1.type = array
(s, t)
Then t
else type_error
E.type:= if E1.type = pointer (t) then t
Else type_error
E E1[E2]
E E1
12/13/2012 5:14:46 PM
14. (a)
2.193
12/13/2012 5:14:46 PM
2.194
1. Peephole Optimization
Peephole optimization is a technique for improving the quality
of the target code, the technique can also be applied directly after
intermediate code generation to improve the intermediate representation. It takes limited range of code and replaces them by shorter
sequence of codes. It is a local code improving.
Characteristics of Peephole optimization are
(1)
(2)
(3)
(4)
(5)
(6)
K = 0;
If K = 0 goto L1;
K = k+1;
L1 :
12/13/2012 5:14:46 PM
2.195
L1 :goto L2;
Algebraic Simplications
There is no end to the amount of algebraic simplications that
can be attempted through peephole optimization. For example
statements such as
X=X+0
or
X=X*1
Are often produced by intermediate code generation algorithms
and they can be eliminated easily through peephole optimization.
Use of Machine Idioms
The target machine may have hardware instructions to implement
certain operations efciently. Detecting situations that permit the
use of these instructions can reduce execution time signicantly. For
example some machines, have auto decrement addressing modes.
The use of those modes greatly improves the quality of code.
Reduction In Strength
Reduction in strength replaces expensive operation by equivalent cheaper ones on the target machine. For example X2 is invariably cheaper to implement X * X than as a call to an exponentiation
routine.
2. Issues in Code generation
The issues in the design of code generator are
a. Input to the code generator
b. Target Program
12/13/2012 5:14:46 PM
2.196
a.
b.
c.
d.
e.
c. Memory management
d. Instruction selection
e. Register allocation
Input to the code generator
The input to the code generator consists of an intermediate
representation of the source program, together with information
in the symbol table that is used to determine the runtime addresses
of the data objects denoted by the names in the intermediate
representation. The intermediate code may be any form such as
three address code, quadruple, triples, postx notations, or it may
be represented using graphical representations such as Syntax
Trees or Directed Acyclic Graphs.
Target Program
The output of the code generator is a target program. The output
may take on a variety of forms.
(i) Absolute Machine language
(ii) Relocatable Machine Language
(iii) Assembly Language
Memory Management
Mapping names in the source program to addresses of data
objects in run time memory is done cooperatively by the front
end and the code generator. Symbol table entries were created
as the declarations in a procedure were examined. The type in
declaration determines the width. From the symbol table information, a relative address can be determined for the name in a
data area for the procedure. If machine is being generated, labels
in three address statements have to be converted to addresses of
instructions.
Instruction Selection
The uniformity and completeness of instruction set is an important factor for the code generator. The selection of instruction
depends upon the instruction set of target machine. The speed of
instruction and machine idioms are two important factors in the
selection of instruction. If we do not care about the efciency of
the target program, instruction selection is straight forward.
Register Allocation
Instructions involving register operands are usually shorter and
faster than those involving operands in memory. Hence efcient
utilization of registers is important in generating good code. The
use of registers is subdivided into two subproblems
12/13/2012 5:14:46 PM
2.197
15. (b)
12/13/2012 5:14:46 PM
PART B (5 16 = 80 Marks)
11. (a)
(i) Explain in detail about the role of lexical analyzer with the
possible error recovery actions?
(ii) What is a compiler? Explain the various phases of compiler in
detail, with a neat sketch?
Or
(b)
12/13/2012 10:12:31 PM
2.199
(b)
13. (a)
(b)
14. (a)
(b)
15. (a)
(b)
12/13/2012 10:12:31 PM
PART B (5 16 = 80 Marks)
11. (a) (i) Write about the phases of compiler and by assuming an input and
show the output of various phases.
09-April-May-2010.indd 200
12/13/2012 10:12:58 PM
2.201
09-April-May-2010.indd 201
12/13/2012 10:12:58 PM
12_Nov-Dec_2007.indd 246
12/13/2012 5:45:57 PM