Compiler Key3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

NOV/DEC-'07/CS1352-Answer Key

7. What are basic blocks and flow graphs?


A basic block is a sequence of consecutive statements in which flow of
control enters at the beginning and leaves at the end without halt or possibility of
branching except at the end.
A flow graph is a directed graph in which the flow control information is added to
the basic blocks.
· The nodes in the flow graph are basic blocks
· the block whose leader is the first statement is called initial block.
· There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in
the some execution sequence. We can say that B1 is a predecessor of B2 and B2 is a
successor of B1.

8. What are the limitations of static allocation?


o The size of a data object and constraints on its position in memory must be
known at compile time
o Recursive procedures are restricted, because all activations of a procedure use
the same bindings for local names.
o Data structures cannot be created dynamically, since there is no mechanism
for storage allocation at run time.

9. Define activation tree.


Each execution of a procedure is referred to as an activation of the procedure.
Activation tree depicts the way control enters and leaves activations. Here, each node
represents activation of a procedure, root represents the activation of the main
program, node for a is the parent of the node for b iff control flows from activation a
to b and the node for a is to the left of the node for b iff the lifetime of a occurs before
the lifetime of b.

10. What is inline expansion?


Here, the body of the procedure is substituted for the call in the caller, with
the actual parameters literally substituted for the formals. i.e. the procedure is treated
as if it were a macro.
PART – B

11. a. i. Explain in detail about the role of lexical analyzer with the possible error
recovery actions. (6)
Few errors are discernible at the lexical level alone, because a lexical analyzer has a
very localized view of a source program. The simplest recovery strategy is “panic mode”
recovery: delete the successive characters from the remaining input until the lexical
analyzer can find a well-formed token. Other possible error recovery actions are
o Deleting an extraneous character
o Inserting a missing character
o Replacing an incorrect character by a correct character
o Transposing two adjacent characters

-2-
NOV/DEC-'07/CS1352-Answer Key

ii. What is a compiler? Explain the various phases of compiler in detail, with a neat
sketch. (10)
The process of compilation is very complex. So it comes out to be customary from
the logical as well as implementation point of view to partition the compilation process
into several phases. A phase is a logically cohesive operation that takes as input one
representation of source program and produces as output another representation. (2)
Source program is a stream of characters: E.g. pos = init + rate * 60 (6)
– lexical analysis: groups characters into non-separable units, called token, and
generates token stream: id1 = id2 + id3 * const
• The information about the identifiers must be stored somewhere (symbol
table).
– Syntax analysis: checks whether the token stream meets the grammatical
specification of the language and generates the syntax tree.
– Semantic analysis: checks whether the program has a meaning (e.g. if pos is a record
and init and rate are integers then the assignment does not make a sense).
:=
:=

id1
+
id1
+
id2
*
id2
*
id3 inttoreal

id3 60 60

Syntax analysis Semantic analysis


– Intermediate code generation, intermediate code is something that is both close to the
final machine code and easy to manipulate (for optimization). One example is the three-
address code:
dst = op1 op op2
• The three-address code for the assignment statement:
temp1 = inttoreal(60);
temp2 = id3 * temp1;
temp3 = id2 + temp2;
id1 = temp3
– Code optimization: produces better/semantically equivalent code.
temp1 = id3 * 60.0
id1 = id2 + temp1
– Code generation: generates assembly
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
 Symbol Table Creation / Maintenance
 Contains Info (storage, type, scope, args) on Each “Meaningful” Token, typically
Identifiers
 Data Structure Created / Initialized During Lexical Analysis
 Utilized / Updated During Later Analysis & Synthesis

-3-
NOV/DEC-'07/CS1352-Answer Key

 Error Handling
 Detection of Different Errors Which Correspond to All Phases
 Each phase should know somehow to deal with error, so that compilation
can proceed, to allow further errors to be detected
Source Program

1
Lexical Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler


Manager
4 Intermediate Code
Generator

5
Code Optimizer

6
Code Generator

Target Program
(2)

(OR)

b. i. Give the minimized DFA for the following expression (a|b)*abb. (10)

Syntax tree for (a|b)*abb#:

-4-
NOV/DEC-'07/CS1352-Answer Key

Calculation of firstpos, lastpos and nullable for nodes in syntax tree:

Calculation of followpos:

Node followpos
1 {1, 2, 3}
2 {1, 2, 3}
3 {4}
4 {5}
5 {6}
6 -

Now, the start state of DFA is firstpos of the root


So, A= {1, 2, 3}
Consider the input symbol ‘a’:
Position 1 and 3 are for ‘a’ in A
So, let B = followpos(1) U followpos(3)
= {1, 2, 3} U {4} = {1, 2, 3, 4}
DTrans[A, a] = B
Consider the input symbol ‘b’:
Position 2 is for ‘b’ in A
So, let B = followpos(2)
= {1, 2, 3} = A
DTrans[A, b] = A

-5-
NOV/DEC-'07/CS1352-Answer Key

Now continue with B,


Consider the input symbol ‘a’:
Position 1 and 3 are for ‘a’ in A
So, followpos(1) U followpos(3)
= {1, 2, 3} U {4} = {1, 2, 3, 4} = B
DTrans[B, a] = B
Consider the input symbol ‘b’:
Position 2 and 4 are for ‘b’ in B
So, followpos(2) U followpos(4)
= {1, 2, 3, 4, 5} = C
DTrans[B, b] = C

Now continue with C,


Consider the input symbol ‘a’:
Position 1 and 3 are for ‘a’ in A
So, followpos(1) U followpos(3)
= {1, 2, 3} U {4} = {1, 2, 3, 4} = B
DTrans[C, a] = B
Consider the input symbol ‘b’:
Position 2 and 5 are for ‘b’ in C
So, followpos(2) U followpos(5)
= {1, 2, 3, 6} = D
DTrans[C, b] = D

Now continue with D,


Consider the input symbol ‘a’:
Position 1 and 3 are for ‘a’ in D
So, followpos(1) U followpos(3)
= {1, 2, 3} U {4} = {1, 2, 3, 4} = B
DTrans[D, a] = B
Consider the input symbol ‘b’:
Position 2 is for ‘b’ in D
So, followpos(2) = {1, 2, 3} = A
DTrans[D, b] = A

The position associated with the end marker #, 6 is in D. So, D is the final state.
DFA
a a
b
a b b
A B C D
a

-6-
NOV/DEC-'07/CS1352-Answer Key

Transition table:

Input symbol
States
a b
A B A
B B C
C B D
D B A

ii. Draw the transition diagram for unsigned numbers. (6)

12. a. i. Explain the role of parser in detail. (4)

Parser obtains a string of tokens from the lexical analyzer and verifies that the string
can be generated by the grammar for the source language. It can report any syntax error
in an intelligible fashion.
Errors can be of lexical, syntactic, semantic or logical. The error handler in a parser has
simple-to-state goals:
 should report the presence of errors clearly and accurately
 should recover from each error quickly enough to be able to detect subsequent
errors
 should not significantly slow down the processing of correct programs

-7-
NOV/DEC-'07/CS1352-Answer Key

ii. Construct predictive parsing table for the grammar


E->E+T | T, T->T*F | F, F->(E)|id (12)
Eliminating left recursion: (2)
E->TE’
E’->+TE’ | å
T->FT’
T’->*FT’ | å
F-> (E) | id
Calculation of First: (2)
First (E) = First (T) = First (F) = {(, id}
First (E’) = {+, å}
First (T’) = {*, å}
Calculation of Follow: (2)
Follow (E) = Follow (E’) = {), $}
Follow (T) = Follow (T’) = {+,), $}
Follow (F) = {+, *,), $}

Predictive parsing table: (6)

Non Input Symbol


terminal id + * ( ) $
E E->TE’ E->TE’
E’ E’->+TE’ E’-> å E’-> å
T T->FT’ T->FT’
T’ T’-> å T’->*FT’ T’-> å T’-> å
F F->id F->(E)

(OR)

b. i. Give the LALR parsing table for the grammar (12)


S-> L=R | R
L->*R | id
R->L.
Given grammar:
1. S->L=R
2. S->R
3. L->*R
4. L->id
5. R->L
Augmented grammar:
S’->S
S->L=R
S->R
L->*R
L->id
R->L

-8-
NOV/DEC-'07/CS1352-Answer Key

Canonical collection of LR(1) items L->.*R, $


I0: S’->.S, $ L->.id, $
S->.L=R, $ I7: goto(I4, R)
S->.R, $ L->*R., =
L->.*R, = I8: goto(I4, L)
L->.id, = R->L., =
R->.L, $ goto(I4, *)=I4
I1: goto(I0, S) goto(I4, id)=I5
S->S., $ I9: goto(I6, R)
I2: goto(I0, L) S->L=R., $
S->L.=R, $ I10: goto(I6, L)
R->L., $ R->L., $
I3: goto(I0, R) I11: goto(I6, *)
S->R., $ L->*.R, $
I4: goto(I0, *) R->.L, $
L->*.R, = L->.*R, $
R->.L, = L->.id, $
L->.*R, = I12: goto(I6, id)
L->.id, = L->id., $
I5: goto(I0, id) I13: goto(I11, R)
L->id., = L->*R., $
I6: goto(I2, =) goto(I11, L)=I10
S->L=.R, $ goto(I11, *)=I11
R->.L, $ goto (I11, id)=I12

LR (1) table construction:

action goto
States
= * id $ S L R
0 s4 s5 1 2 3
1 Acc
2 s6 r5
3 r2
4 s4 s5 8 7
5 r4
6 s11 s12 10 9
7 r3
8 r5
9 r1
10 r5
11 s11 s12 10 13
12 r4
13 r3

This grammar is LR(1), since it does not produce any multi-defined entry in its
parsing table.

-9-
NOV/DEC-'07/CS1352-Answer Key

LALR table construction:

I4 and I11 are similar. Combine them as


I411 or I4:
L->*.R, =/$
R->.L, =/$
L->.*R, =/$
L->.id, =/$
I5 and I12 are similar. Combine them as
I512 or I5:
L->id., =/$
I7 and I13 are similar. Combine them as
I713 or I7:
L->*R., =/$
I8 and I10 are similar. Combine them as
I810 or I8:
R->L., =/$

action goto
States
= * id $ S L R
0 s4 s5 1 2 3
1 Acc
2 s6 r5
3 r2
4 s4 s5 8 7
5 r4 r4
6 s4 s5 8 9
7 r3 r3
8 r5 r5
9 r1

ii. What are the reasons for using LR parser technique? (4)

 LR parsers can be constructed to recognize virtually all programming


language constructs for which CFGs can be written
 LR parsing method is the most general non backtracking shift reduce parsing
method known, yet it can be implemented as efficiently as other shift-reduce
methods
 The class of grammars that can be parsed using LR methods is a proper
superset of the class of grammars that can be parsed with predictive parsers
 An LR parser can detect a syntactic error as soon as it is possible to do so on a
left-to-right scan of the input

- 10 -
NOV/DEC-'07/CS1352-Answer Key

13. a. i. Explain about the different type of three address statements. (8)
It is one of the intermediate representations. It is a sequence of statements of the
form x:= y op z, where x, y, and z are names, constants or compiler-generated
temporaries and op is an operator which can be arithmetic or a logical operator. E.g.
x+y*z is translated as t1=y*z and t2=x+t1.
Reason for the term three-address code is that each statement usually contains
three addresses, two for the operands and one for the result. (2)

Common three address statements: (2)


 x:=y op z (assignment statements)
 x:= op y (assignment statements)
 x:=y (copy statements)
 goto L (unconditional jump)
 Conditional jumps like if x relop y goto L
 param x, call p,n and return y for procedure calls
 indexed assignments x:=y[i] and x[i]:= y
 address and pointer assignments x:=&y, x:=*y and *x:=y

Implementation: (4)
 Quadruples
Record with four fields, op, arg1, arg2 and result
 Triples
Record with three fields, op, arg1, arg2 to avoid entering temporary
names into symbol table. Here, refer the temporary value by the position of
the statement that computes it.
 Indirect triples
List the pointers to triples rather than listing the triples

For a: = b* -c + b * -c
Quadruples
Op arg1 arg2 result
(0) uminus c t1
(1) * b t1 t2
(2) uminus c t3
(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a

Triples
Op arg1 arg2
(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)

- 11 -
NOV/DEC-'07/CS1352-Answer Key

Indirect Triples
Op arg1 arg2 Statement
(14) uminus c (0) (14)
(15) * b (14) (1) (15)
(16) uminus c (2) (16)
(17) * b (16) (3) (17)
(18) + (15) (17) (4) (18)
(19) assign a (18) (5) (19)

ii. What are the methods of translating Boolean expression? (8)


 Used to compute logical values. (2)
 Used as conditional expressions in statements, that alters the flow of control.
 Operators used are and, or and not.
 Elements are Boolean variables/relational expressions.

Methods of translating Boolean expressions: (2)


 Encode true and false numerically and evaluate like arithmetic expression
 By flow of control, i.e. represent the value of Boolean expression by a position
reached in the program
Semantics of programming language determines whether all parts of the Boolean
expression must be evaluated. If so, can optimize the evaluation by computing only
enough of it to determine its value.

Syntax directed definitions to produce 3AC for Booleans: (4)


E --> E1 or E2
{ E1.True =E.True; E2.True=E.True; E1.false=newlabel();
E2.false=E.false; E.code=E1.code || gen(”E1.false,”:”) || E2.code }
E--> E1 and E2
{ E1.true=newlabel();E2.true=E.true; E1.false=E.false; E2.false=E.false;
E.code=E1.code||gen(E1.true,”.”)||E2.code}
E--> not E1 {E1.false=E.true;E1.true=E.false; E.code=E1.code}
E--> (E1) {E1.true=E.true;E1.false=E.false; E.code=E1.code}
E--> ID1 RELOP ID2
{E.code=gen(“if”ID1.place RELOP ID2.place “goto “ E.true|| gen(“goto”
E.false}
E--> True
{ F.code=gen(“goto”E.true)}
E--> false
{F.code=gen(“goto”E.false)}

(OR)

b. i. Write short notes on back-patching. (8)


Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process. (2)

- 12 -
NOV/DEC-'07/CS1352-Answer Key

In the semantic actions the functions used are (2)


mklist(i) – create a new list having i, an index into array of quadruples.
merge(p1,p2) - merges two lists pointed by p1 and p2
back patch(p,j) inserts the target label j for each list pointed by p.
Example: (4)
Source: L2: x= y+1
if a or b then L3:
if c then After Backpatching:
x= y+1 100: if a goto 103
Translation: 101: if b goto 103
if a go to L1 102: goto 106
if b go to L1 103: if c goto 105
go to L3 104: goto 106
L1: if c goto L2 105: x=y+1
goto L3 106:

ii. Explain procedure calls with an example. (8)


Procedure is an important and frequently used programming construct that is
imperative for a compiler to generate good code for procedure calls and returns. (2)
Consider the following grammar for a simple procedure call statement:
S-> call id (Elist)
Elist -> Elist, E
Elist ->E
Calling sequences: (2)
The translation for a call includes a calling sequence, a sequence of actions taken
on entry to and exit from each procedure.
Example: (4)
Syntax directed translation:
S-> call id(Elist)
{for each item p on queue do
Emit(‘param’ p);
Emit(‘call’ id.place)}
Elist -> Elist, E
{append E.place to the end of the queue}
Elist - > E
{initialize queue to contain only E.place}
E.g. Call p1(int a, int b)
param a
param b
call p1

14. a. i. Construct the DAG for the following basic block: (6)
d:=b*c
e:=a+b
b:=b*c
a:=e-d

- 13 -
NOV/DEC-'07/CS1352-Answer Key

ii. Explain in detail about primary structure-preserving transformations on basic


blocks. (10)
Structure preserving transformations:
It is implemented by constructing a dag for a basic block. Common sub expression
can be detected by noticing, as a new node m is about to be added, whether there is an
existing node n with the same children, in the same order, and with the same operator. If
so, n computes the same value as m and may be used in its place.
E.g. DAG for the basic block
d:=b*c
e:=a+b
b:=b*c
a:=e-d is given by

For dead-code elimination, delete from a dag any root (root with no ancestors)
that has no live variables. Repeated application of this will remove all nodes from the dag
that corresponds to dead code.

(OR)

b. i. Describe in detail about a simple code generator with the appropriate


algorithm. (8)
It generates target code for a sequence of three address statements. (2)
Assumptions:
 For each operator in three address statement, there is a corresponding target
language operator.
 Computed results can be left in registers as long as possible.
E.g. a=b+c: (2)
 Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1;
 Add c, Ri where Ri has b and result in Ri. Cost=2;
 Mov c, Rj; Add Rj, Ri; Cost=3;

- 14 -
NOV/DEC-'07/CS1352-Answer Key

Register descriptor: Keeps track of what is currently in each register


Address descriptor: Keeps tracks of the location where the current value of the name
can be found at run time.
Code generation algorithm: For x= y op z (2)
 Invoke the function getreg to determine the location L, where the result of y
op z should be stored (register or memory location)
 Check the address descriptor for y to determine y’
 Generate the instruction op z’, L where z’ is the current location of z
 If the current values of y and/or z have no next uses, alter register descriptor
Getreg: (2)
 If y is in a register that holds the values of no other names and y is not live,
return register of y for L
 If failed, return empty register
 If failed, if X has next use, find an occupied register and empty it
 If X is not used in the block, or suitable register is found, select memory
location of x as L

ii. Explain in detail about run-time storage management. (8)


Information needed during an execution of a procedure is kept in a block of
storage called an activation record; storage for names local to the procedure also appears
in the activation record. Two standard storage-allocation strategies are
 Static allocation (4)
The position of an activation record in memory is fixed at compile time.
Here, a new activation record is pushed onto the stack for each execution of a
procedure. The record is popped when the activation ends.
Activation record for a procedure has fields to hold parameters, results,
machine status information, local data, temporaries and the like.
A call statement is implemented by a sequence of two target-machine
instructions. A MOV instruction saves the return address and a GOTO transfers
control to the target code for the called procedure.
 Stack allocation (4)
Static allocation becomes stack allocation by using relative addresses for
storage in activation records. The position of the record for an activation of a
procedure is not known until run time. In stack allocation, this position is usually
stored in a register (Indexed address mode).
Relative addresses in an activation record can be taken as offsets from any
known position in the activation record.

15. a. i. Explain in detail about principle sources of optimization. (10)


Code optimization is needed to make the code run faster or take less space or both.
Function preserving transformations:
 Common sub expression elimination
 Copy propagation
 Dead-code elimination
 Constant folding

- 15 -
NOV/DEC-'07/CS1352-Answer Key

Common sub expression elimination: (2)


E is called as a common sub expression if E was previously computed and the
values of variables in E have not changed since the previous computation.
Copy propagation: (2)
Assignments of the form f:=g is called copy statements or copies in short. The
idea here is use g for f wherever possible after the copy statement.
Dead code elimination: (2)
A variable is live at a point in the program if its value can be used subsequently.
Otherwise dead. Deducing at compile time that the value of an expression is a constant
and using the constant instead is called constant folding.
Loop optimization: (4)
 Code motion: Moving code outside the loop
Takes an expression that yields the same result independent of the number of
times a loop is executed (a loop-invariant computation) and place the expression before
the loop.
 Induction variable elimination
 Reduction in strength: Replacing an expensive operation by a cheaper one.

ii. Describe in detail about optimization of basic blocks with example. (6)
Code improving transformations:
 Structure-preserving transformations
o Common sub expression elimination
o Dead-code eliminations
 Algebraic transformations like reduction in strength.
Structure preserving transformations: (3)
It is implemented by constructing a dag for a basic block. Common sub
expression can be detected by noticing, as a new node m is about to be added,
whether there is an existing node n with the same children, in the same order, and
with the same operator. If so, n computes the same value as m and may be used in its
place.
E.g. DAG for the basic block
d:=b*c
e:=a+b
b:=b*c
a:=e-d is given by

- 16 -
NOV/DEC-'07/CS1352-Answer Key

For dead-code elimination, delete from a dag any root (root with no ancestors)
that has no live variables. Repeated application of this will remove all nodes from the
dag that corresponds to dead code.
Use of algebraic identities: (3)
e.g. x+0 = 0+x=x
x-0 = x
x*1 = 1*x = x
x/1 = x
Reduction in strength:
Replace expensive operator by a cheaper one.
x ** 2 = x * x
Constant folding:
Evaluate constant expressions at compile time and replace them by their values.
Can use commutative and associative laws
E.g. a=b+c
e=c+d+b
IC: a=b+c
t=c+d
e=t+b
If t is not needed outside the block, change this to
a=b+c
e=a+d
using both the associativity and commutativity of +.

(OR)

b. i. Describe in detail about storage organization. (10)


Subdivision of run time memory: (4)
Run time storage: The block of memory obtained by compiler from OS to execute the
compiled program. It is subdivided into
Code
Static data
 Generated target code
 Data objects Stack
 Stack to keep track of the activations
 Heap to store all other information
Heap
Activation record: (Frame) (4)
It is used to store the information required by a single procedure call.
Returned value
Actual parameters
Optional control link
Optional access link
Saved machine status
Local data
temporaries

- 17 -
NOV/DEC-'07/CS1352-Answer Key

Temporaries are used to hold values that arise in the evaluation of expressions.
Local data is the data that is local to the execution of procedure. Saved machine status
represents status of machine just before the procedure is called. Control link (dynamic
link) points to the activation record of the calling procedure. Access link refers to the
non-local data in other activation records. Actual parameters are the one which is passed
to the called procedure. Returned value field is used by the called procedure to return a
value to the calling procedure

Compile time layout of local data: (2)


The amount of storage needed for a name is determined by its type. The field for
the local data is laid out as the declarations in a procedure are examined at compile time.
The storage layout for data objects is strongly influenced by the addressing constraints on
the target machine.

ii. Explain in detail various methods of passing parameters. (6)


• Call by value
– A formal parameter is treated just like a local name. Its storage is in the
activation record of the called procedure
– The caller evaluates the actual parameter and place the r-value in the storage
for the formals
• Call by reference
• If an actual parameter is a name or expression having L-value, then that l-
value itself is passed
• However, if it is not (e.g. a+b or 2) that has no l-value, then expression is
evaluated in the new location and its address is passed.
• Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out)
– Actual parameters evaluated, its r-value is passed and l-value of the actuals
are determined
– When the called procedure is done, r-value of the formals are copied back to
the l-value of the actuals
• Call by name
– Inline expansion(procedures are treated like a macro)

- 18 -

You might also like