Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

MAY/JUNE-'07/CS1352-Answer Key

7. What are the applications of DAG?


· Determining the common sub-expressions.
· Determining which identifiers have their values used in the block
· Determining which statements of the block compute value outside the block.

8. Give the primary structure preserving transformations on Basic Blocks.


 Common sub expression elimination
 Dead-code elimination
 Renaming of temporary variables
 Interchange of two independent adjacent statements

9. What do you mean by code motion?


It decreases the amount of code in a loop. Taking the expression which yield
the same result independent of the number of times a loop is executed (a loop-
invariant computation and places it before the loop.

10. Draw the diagram of the general activation record and give the purpose of any
two fields.
Returned value
Actual parameters
Optional control link
Optional access link
Saved machine status
Local data
temporaries
Temporaries are used to hold values that arise in the evaluation of expressions.
Returned value field is used by the called procedure to return a value to the calling
procedure
PART – B

11. a. i. Write about the phases of compiler and by assuming an input and show the
output of various phases. (10)
The process of compilation is very complex. So it comes out to be customary from
the logical as well as implementation point of view to partition the compilation process
into several phases. A phase is a logically cohesive operation that takes as input one
representation of source program and produces as output another representation. (2)

Source program is a stream of characters: E.g. pos = init + rate * 60 (6)


– lexical analysis: groups characters into non-separable units, called token, and
generates token stream: id1 = id2 + id3 * const
• The information about the identifiers must be stored somewhere (symbol
table).
– Syntax analysis: checks whether the token stream meets the grammatical
specification of the language and generates the syntax tree.

-2-
MAY/JUNE-'07/CS1352-Answer Key

– Semantic analysis: checks whether the program has a meaning (e.g. if pos is a record
and init and rate are integers then the assignment does not make a sense).
:=
:=

id1
+
id1
+
id2
*
id2
*
id3 inttoreal

id3 60 60

Syntax analysis Semantic analysis

– Intermediate code generation, intermediate code is something that is both close to the
final machine code and easy to manipulate (for optimization). One example is the three-
address code:
dst = op1 op op2
• The three-address code for the assignment statement:
temp1 = inttoreal(60);
temp2 = id3 * temp1;
temp3 = id2 + temp2;
id1 = temp3

– Code optimization: produces better/semantically equivalent code.


temp1 = id3 * 60.0
id1 = id2 + temp1

– Code generation: generates assembly


MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1

 Symbol Table Creation / Maintenance


 Contains Info (storage, type, scope, args) on Each “Meaningful” Token, typically
Identifiers
 Data Structure Created / Initialized During Lexical Analysis
 Utilized / Updated During Later Analysis & Synthesis
 Error Handling
 Detection of Different Errors Which Correspond to All Phases
 Each phase should know somehow to deal with error, so that compilation
can proceed, to allow further errors to be detected

-3-
MAY/JUNE-'07/CS1352-Answer Key

Source Program

1
Lexical Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler


Manager
4 Intermediate Code
Generator

5
Code Optimizer

6
Code Generator

Target Program
(2)

ii. Explain briefly about compiler construction tools. (6)


 Parser Generators : Produce Syntax Analyzers
 Scanner Generators : Produce Lexical Analyzers
 Syntax-directed Translation Engines : Generate Intermediate Code
 Automatic Code Generators : Generate Actual Code
 Data-Flow Engines : Support Optimization

(OR)

b. i. Construct the NFA from the (a|b)*a(a|b) using Thompson’s construction


algorithm. (10)

The algorithm is syntax directed in that it uses the syntactic structure of the
regular expression to guide the construction process. First, parse the regular expression r
into its constituent sub expressions. Then using various rules, construct NFA’s for each
of the basic symbols in r.

-4-
MAY/JUNE-'07/CS1352-Answer Key

-5-
MAY/JUNE-'07/CS1352-Answer Key

ii. Explain about Input buffering technique. (6)


Determining the next lexeme requires reading the input beyond the end of the
lexeme.

Buffer Pairs: (2)


Concerns with efficiency issues
Used with a look ahead on the input
It is a specialized buffering technique used to reduce the overhead required to process
an input character. Buffer is divided into two N-character halves. Use two pointers. Used
at times when the lexical analyzer needs to look ahead several characters beyond the
lexeme for a pattern before a match is announced. One pointer called forward pointer,
points to first character of the next lexeme found. The string of characters between two
forms the lexeme.

-6-
MAY/JUNE-'07/CS1352-Answer Key

Increment procedure for forward pointer: (2)


If forward at end of first half then
reload second half
forward+=1
else if forward at end of second half
reload the first half
move forward to beginning of first half
else
forward+=1
Sentinels: (2)
It is the special character which cannot be a part of source program. It is used
to reduce the two tests into one. e.g. eof
Increment procedure for forward pointer using sentinels:
forward+=1
if forward ↑=eof then
If forward at end of first half then
reload second half
forward+=1
else if forward at end of second half
reload the first half
move forward to beginning of first half
else
terminate lexical analysis

12. a. i. Construct predictive parsing table for the grammar (10)


S->(L) | a
L->L,S | S
After the elimination of left recursion: (2)
S->(L) | a
L->SL’
L’->,SL’ | å
Calculation of First: (2)
First(S) = {(, a}
First(L) = {(, a}
First(L’) = {, , å}
Calculation of Follow: (2)
Follow(S) = {$, , ,)}
Follow (L) = {)}
Follow (L’) = {)}
Predictive parsing table: (4)
Non Input symbol
terminals a ( ) , $
S S->a S->(L)
L L->SL’ L->SL’
L’ L’->å L’->,SL’

-7-
MAY/JUNE-'07/CS1352-Answer Key

ii. What are the different strategies that a parser can employ to recover from syntax
errors? (6)
 Panic mode recovery
On discovering an error, the parser discards input symbols one at a time until
one of a designated set of synchronizing tokens is found.
 Phrase level recovery
On discovering an error, the parser may perform local correction on the
remaining input; e.g. replace prefix of the remaining input by some string that allow
the parser to continue
 Error productions
Augment the grammar for the language with productions that generate the
erroneous constructs. If it is being used by the parser, generate appropriate error
diagnostics to indicate the erroneous construct that has been recognized in the input
 Global correction
It does minimal changes in the incorrect input string to obtain a globally least-cost
correction.
(OR)

b. i. Construct the CLR parsing table from (10)


S->AA, A->Aa | b
Augmented grammar: A->A.a, b
S’->S A->.Aa, b
S->AA A->.b, b
A->Aa I3: goto(I0, b)
A->b A->b., b
I0: S’->.S, $ I4: goto(I2, A)
S->.AA, $ S->AA., $
A->.Aa, b A->Aa., b
A->.b, b A->A.a, b
I1: goto(I0, S) I5: goto(I4, a)
S’->S., $ A->Aa., b
I2: goto(I0, A) goto(I2, a)=I5
S->A.A,$ goto(I2, b)=I3
`
Parsing table:

Action Goto
States
a b $ S A
0 s3 1 2
1 acc
2 r2, r3, s5 r2, r3, s3 4
3 r3 r3 r3
4 s5
5 r2 r2

-8-
MAY/JUNE-'07/CS1352-Answer Key

ii. Write Operator-precedence parsing algorithm. (6)


set ip to point to the first symbol of w$;
repeat
if $ is on top of the stack and ip points to $ then
return
else
let a be the topmost terminal symbol on the stack and let b be the symbol
pointed to by ip;
if a<.b or a=b then
push b onto stack;
advance ip to the next input symbol;
else if a.>b then
repeat
pop the stack
until the top stack terminal is related by <. to the terminal most
recently popped
else
error()
end
end

13. a. i. Write about implementation of three addressing statements. (8)


It is one of the intermediate representations. It is a sequence of statements of the
form x:= y op z, where x, y, and z are names, constants or compiler-generated
temporaries and op is an operator which can be arithmetic or a logical operator. E.g.
x+y*z is translated as t1=y*z and t2=x+t1. (4)
Reason for the term three-address code is that each statement usually contains
three addresses, two for the operands and one for the result. (2)
Implementation:
 Quadruples
Record with four fields, op, arg1, arg2 and result
 Triples
Record with three fields, op, arg1, arg2 to avoid entering temporary names
into symbol table. Here, refer the temporary value by the position of the statement
that computes it.
 Indirect triples
List the pointers to triples rather than listing the triples
For a: = b* -c + b * -c
Quadruples
Op arg1 arg2 result
(0) uminus c t1
(1) * b t1 t2
(2) uminus c t3
(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a

-9-
MAY/JUNE-'07/CS1352-Answer Key

Triples
Op arg1 arg2
(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)
Indirect Triples
Op arg1 arg2 Statement
(14) uminus c (0) (14)
(15) * b (14) (1) (15)
(16) uminus c (2) (16)
(17) * b (16) (3) (17)
(18) + (15) (17) (4) (18)
(19) assign a (18) (5) (19)

ii. Give the syntax-directed definition for flow of control statements. (8)
Flow of control statements:
S-> if E then S1 | if E then S1 else S2 | while E do S1
If-statement: (4)
Semantic rules for if E then S1:
E.true:= newlabel;
E.false:=S.next;
S1.next:=S.next;
S.code:=E.code || gen(E.true “:”) || S1.code
Semantic rules for if E then S1 else S2:
E.true:= newlabel;
E.false:=newlabel;
S1.next:=S.next;
S2.next:=S.next;
S.code:=E.code || gen(E.true “:”) || S1.code || gen(‘goto’ S.next) ||
gen(E.false “:”) || S2.code
Example:
Statement: a and b and c
if a were false, then we need not evaluate the rest of the expressions. So, we insert
labels E.true and E.false in the appropriate places.
if a goto E.true
goto E.false
E.true: if b goto E1.true
goto E.false
E1.true: if c goto E2.true
goto E.false
E2.true : exp =1
E.false: exp =0

- 10 -
MAY/JUNE-'07/CS1352-Answer Key

Semantic rules for while E do S1: (4)


S.begin:=newlabel
E.true:= newlabel;
E.false:=S.next;
S1.next:=S.begin;
S.code:=gen(S.begin’:’) || E.code || gen(E.true “:”) || S1.code || gen(‘goto’ S.begin)
Example:
while a<b do
if c<d then
x=y+z
else
x=y-z
3AC generated is
L1: if a<b goto L2
goto Lnext
L2: if c<d goto L3
goto L4
L3: t1:=y+z
x:=t1
goto L1
L4: t4:=y-z
x:=t2
goto L1
Lnext:
(OR)

b. i. How back patching can be used to generate code for Boolean expressions and
flow of control statements. (10)
Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process. In the semantic
actions the functions used are mklist(i), merge_list(p1,p2) and backpatch(p,i). (2)

Boolean expressions: (4)


Consider the following grammar:
E → E1 or ME2
E → E1 and ME2
E → not E1
E → (E1)
E → id1 relop id2
E → false
E → true
M→å
Here, the synthesized attributes truelist and falselist of nonterminal E are used to
generate jumping code for Boolean expressions.

- 11 -
MAY/JUNE-'07/CS1352-Answer Key

The corresponding semantic rules are given by:


E → E1 or M E2 {Backpatch( E1.falselist,M.quad); E.truelist = merge (E2.truelist,
E1.truelist); E.falselist=E2.falselist; }
E → E1 and M E2 {Backpatch(E1.truelist,M.quad); E.falselist = merge (E1.falselist,
E2.falselist); E.truelist=E2.truelist}
E → not E1 { E.truelist = E1.falselist; E.falselist = E1.truelist; }
E → (E1) { E.truelist = E1.truelist;E.falselist=E1.falselist}
E → id1 relop id2 {E.truelist = makelist(nextquad); E.falselist = makelist(nextquad +1);
emit( “if” id1.place relop.op id2.place “goto ____”); emit( “goto ____”);}
E → false { E.falselist = makelist(nextquad); emit( “goto ____”) }
E → true { E.truelist = makelist(nextquad); emit( “goto ____”) }
M → å { M.quad = nextquad }
Example :
Consider the string:
a<b or c<d and e<f
(Assuming that the grammar is left associative)
The corresponding intermediate code is:
100: if ( a<b ) goto ---
101: goto ----
102: L1: if ( c<d ) goto ---
103: goto ---
104: L2: if ( e<f ) goto ---
105: goto ---
The code after backpatching becomes:

100: if ( a<b ) goto ____


101: goto 102
102: if ( c<d ) goto 104
103: goto ____
104 if ( e<f ) goto ____
105: goto ____

Flow-of-control statements: (4)


Consider the following grammar:
S-> if E then S | if E then S else S | while E do S | begin L end | A
L->L;S | S
Semantic rules:
S-> if E then M1 S1 N else M2 S2 {backpatch(E.truelist, M1.quad); backpatch
(E.falselist, M2.quad);S.nextlist =merge(S1.nextlist, merge(N.nextlist, S2.nextlist))}
N->å{N.nextlist:=makelist(nextquad); emit(‘goto –‘);}
M->å {M.quad=nextquad}
S->if E then M S1 {backpatch(E.truelist, M.quad); S.nextlist:= merge(E.falselist,
S1.nextlist)}
S->while M1 E do M2 S {backpatch(S1.nextlist, M1.quad); backpatch(E.truelist,
M2.quad); S.nextlist:=E.falselist; emit(‘goto’ M1.quad)}

- 12 -
MAY/JUNE-'07/CS1352-Answer Key

S->begin L end {S.nextlist:=L.nextlist}


S->A {S.nextlist:=nil}
L->L1;M S {backpatch(L1.nextlist, M.quad); L.nextlist:= S.nextlist}
L->S { L.nextlist:= S.nextlist}
Here, fill in the jumps out of statements when their targets are found. Not only do
Boolean expressions need two lists of jumps that occur when the expression is true and
when it is false, but statements also need list of jumps (given by attribute nextlist) to the
code that follows them in the execution sequence.

ii. Write short notes on procedure calls. (6)


Procedure is an important and frequently used programming construct that is imperative
for a compiler to generate good code for procedure calls and returns. (2)
Consider the following grammar for a simple procedure call statement:
S-> call id (Elist)
Elist -> Elist, E
Elist ->E
Calling sequences: (2)
The translation for a call includes a calling sequence, a sequence of actions taken
on entry to and exit from each procedure.
Example: (2)
Syntax directed translation:
S-> call id(Elist)
{for each item p on queue do
Emit(‘param’ p);
Emit(‘call’ id.place)}
Elist -> Elist, E
{append E.place to the end of the queue}
Elist - > E
{initialize queue to contain only E.place}
E.g. Call p1(int a, int b)
param a
param b
call p1

14. a. i. Write in detail about the issues in the design of a code generator. (10)
 Input to the code generator
Intermediate representation of the source program, like linear
representations such as postfix notation, three address representations such as
quadruples, virtual machine representations such as stack machine code and
graphical representations such as syntax trees and dags.
 Target programs
It is the output such as absolute machine language, relocatable machine
language or assembly language.
 Memory management
Mapping of names in the source program to addresses of data object in run
time memory is done by front end and the code generator.

- 13 -
MAY/JUNE-'07/CS1352-Answer Key

 Instruction selection
Nature of the instruction set of the target machine determines the difficulty of
instruction selection.
 Register allocation
Instructions involving registers are shorter and faster. The use of registers
is being divided into two sub problems:
o During register allocation, we select the set of variables that will reside in
registers at a point in the program
o During a subsequent register assignment phase, we pick the specific
register that a variable will reside in
 Choice of evaluation order
The order in which computations are performed affect the efficiency of target
code.
 Approaches to code generation

ii. What are steps needed to compute the next use information? (6)
If the name in a register is no longer needed, then the register can be assigned to
some other name. This idea of keeping a name in storage only if it will be used
subsequently can be applied in a number of contexts.
Computing next uses: (2)
The use of a name in a three-address statement is defined as follows: Suppose a
three-address statement i assigns a value to x. If statement j has x as an operand and
control can flow from statement i to j along a path that has no intervening assignments to
x, then we say statement j uses the value of x computed at i.
Example:
x:=i
j:=x op y // j uses the value of x
Algorithm to determine next use: (2)
The algorithm to determine next uses makes a backward pass over each basic
block, recording for each name x whether x has a next use in the block and if not,
whether it is live on exit from the block (using data flow analysis). Suppose we reach
three-address statement i: x: =y op z in our backward scan. Then do the following:
 Attach to statement i, the information currently found in the symbol table
regarding the next use and the liveness of x, y, and z.
 In the symbol table, set x to “not live” and “no next use”
 In the symbol table, set y and z to “live” and the next uses of y and z to i.

(OR)

b. i. Discuss briefly about DAG representation of basic blocks. (10)


A DAG for a basic block is a directed acyclic graph in which (2)
 leaves are labeled by unique ids, either variable names or constants
 Interior nodes are operators
 Nodes are also given a sequence of ids for labels to store the computed values.
It is useful for implementing transformations on basic blocks and shows how values
computed by a statement are used in subsequent statements.

- 14 -
MAY/JUNE-'07/CS1352-Answer Key

e.g t1:=4*i (2)


t2:=a[t1]
Dag is
[] t2

t1
*
a

4 i

Algorithm for the construction of DAG: (4)


Input: A basic block
Output: DAG for that basic block, having
 Label for each node where leaves are identifiers, interior nodes are operator
symbol.
 for each node, a list of identifiers to hold computed values
1) x = y op z 2) x = op y 3) x = y
Step 1: If node(y) is undefined, create a leaf labeled y and let node(y) be this node. In 1),
if node(z) is undefined, create a leaf labeled z and let that leaf be node(z)
Step 2: For 1), create node op with left child y and right child z, after checking for
common sub expression
For 2), check for a node op with a child y. If not create such node
For 3), let n be node y.
Step 3: Delete x from the list of identifiers for node x. Append x to the list of attached
identifiers for node n found in step 2 and set node x to n

Applications of DAG: (2)


· Determining the common sub-expressions.
· Determining which identifiers have their values used in the block
· Determining which statements compute values that could be used outside the block
· Simplifying the list of quadruples by eliminating the common sub-expressions and not
performing the assignment of the form x: = y unless and until it is a must.

ii. Explain the characteristics of peephole optimization (6)


Peephole optimization is a simple and effective technique for locally improving
target code. This technique is applied to improve the performance of the target program
by examining the short sequence of target instructions and replacing these instructions by
shorter or faster sequence, whenever is possible.
Peep hole is a small, moving window on the target program.
• Local in nature
• Pattern driven
• Limited by the size of the window
Characteristics of peephole optimization:
· Redundant instruction elimination
· Flow of control optimization
· Algebraic simplification
· Use of machine idioms

- 15 -
MAY/JUNE-'07/CS1352-Answer Key

• Constant Folding
x := 32
x := x + 32 becomes x := 64
• Unreachable Code
An unlabeled instruction immediately following an unconditional jump is removed.
goto L2
x := x + 1  unneeded
• Flow of control optimizations
Unnecessary jumps are eliminated.
goto L1

L1: goto L2 becomes goto L2
• Algebraic Simplification
x := x + 0  unneeded
• Dead code elimination
x := 32  where x not used after statement
y := x + y  y := y + 32
• Reduction in strength
Replace expensive operations by equivalent cheaper ones
x := x * 2  x := x + x

15. a. i. Describe the principal sources of optimization. (8)


Code optimization is needed to make the code run faster or take less space or both.
Function preserving transformations:
 Common sub expression elimination
 Copy propagation
 Dead-code elimination
 Constant folding
Common sub expression elimination: (2)
E is called as a common sub expression if E was previously computed and the
values of variables in E have not changed since the previous computation.
Copy propagation: (2)
Assignments of the form f:=g is called copy statements or copies in short. The
idea here is use g for f wherever possible after the copy statement.
Dead code elimination: (2)
A variable is live at a point in the program if its value can be used subsequently.
Otherwise dead. Deducing at compile time that the value of an expression is a constant
and using the constant instead is called constant folding.
Loop optimization: (2)
 Code motion: Moving code outside the loop
Takes an expression that yields the same result independent of the number of
times a loop is executed (a loop-invariant computation) and place the expression before
the loop.
 Induction variable elimination
 Reduction in strength: Replacing an expensive operation by a cheaper one.

- 16 -
MAY/JUNE-'07/CS1352-Answer Key

ii. Write about Data flow analysis of structural programs. (8)


Flow graphs for control-flow constructs such as do while statements have a useful
property; there is a single beginning point at which control enters and a single end point
that control leaves from when execution of the statement is over.
Some structured control constructs:

Define a portion of a flow graph called a region to be a set of nodes N that includes a
header, which dominates all other nodes in the region. All edges between nodes in N are
in the region, except for some that enter the header. The portion of a flow graph
corresponding to a statement S is a region that obeys the further restriction that control
can flow to just one outside block when it leaves the region.
gen[S] is the set of definitions “generated by S”.
kill[S] be the set of definitions that never reach the end of S, even if they reach the
beginning.
Both are synthesized attributes; they are computed bottom-up, from the smallest
statements to the largest.
Data-flow equations for reaching definitions:

- 17 -
MAY/JUNE-'07/CS1352-Answer Key

(OR)

b. i. What are the different storage allocation strategies? Explain. (10)


Strategies: (2)
• Static allocation lays out storage for all data objects during compile time
• Stack allocation manages the run-time storage as a stack
• Heap allocation allocates and deallocates storages as needed at runtime from heap
area

Static allocation: (2)


• Names are bound to storage at compile time
• No need for run-time support package
• When a procedure is activated, its names are bound to same storage location.
• Compiler must decide where activation records should go.
Limitations:
 size must be known at compile time
 recursive procedures are restricted
 data structures cant be created dynamically

Stack allocation: (3)


 Activation records are pushed and popped as activations begin and end.
 Locals are bound to fresh storage in each activation and deleted when the
activation ends.
 Call sequence and return sequence
 caller and callee
 Dangling references

Heap allocation: (3)


Stack allocation cannot be used if either of the following is possible:
1. The values of local names must be retained when an activation ends
2. A called activation outlives the caller.
 Allocate pieces of memory for activation records, which can be deallocated in any
order
 Maintain linked list of free blocks
 Fill a request for size ‘s’ with a block of size s’, where s’ is the smallest size
greater than or equal to s
 Use heap manager, which takes care of defragmentation and garbage collection.

ii. Write short notes on parameter parsing. (6)


• Call by value
– A formal parameter is treated just like a local name. Its storage is in the
activation record of the called procedure
– The caller evaluates the actual parameter and place the r-value in the storage
for the formals

- 18 -
MAY/JUNE-'07/CS1352-Answer Key

• Call by reference
• If an actual parameter is a name or expression having L-value, then that l-
value itself is passed
• However, if it is not (e.g. a+b or 2) that has no l-value, then expression is
evaluated in the new location and its address is passed.
• Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out)
– Actual parameters evaluated, its r-value is passed and l-value of the actuals
are determined
– When the called procedure is done, r-value of the formals are copied back to
the l-value of the actuals
• Call by name
– Inline expansion(procedures are treated like a macro)

- 19 -

You might also like