Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Prof.

Rekhanjali Sahoo

Intermediate Code Generation


 As we know the task of the compiler is to process the source program
and translate it into a target program. Well, in this process a compiler
may generate one or more intermediate representations.
 The intermediate code generator generates some intermediate
representation. And from this intermediate representation, the compiler
generates the target code.

Logical Structure of Compiler


The logical structure of the compiler has two ends i.e., the front end and a
back end. The front end generates the intermediate representation of the
source program. And this intermediate representation helps the back end in
generating the target program.

CSE Department
Prof. Rekhanjali Sahoo

The frontend end of the compiler includes:

 Parser,
 Static checker
 Intermediate Code Generator

Earlier we have discussed parser and types of parsing i.e. top-down parsing
and bottom-up parsing. The parser parses the input string and generates

CSE Department
Prof. Rekhanjali Sahoo

the parse tree or the syntax tree. The nodes of the syntax tree represent
the operations. And the leaves represent the operands.
As the parsing proceeds some information keeps on attaching to these
nodes. And we refer to it as an annotated parse tree.
SDD= CFG+Semantic Rules

Static Checker
Static checking confirms that the compiler can compile the
program successfully. It identifies the programming errors earlier. This helps
the programmer to rectify the error before a program executes. Static
checking performs two types of checking:

 Syntactic Checking
This kind of checking identifies the syntactic errors present in the
program.
 Type Checking
It checks the operations present in the program. And assures that it
respect the type system of the source language. And if this is not the
case the compiler performs the type conversion.

CSE Department
Prof. Rekhanjali Sahoo

o Coercion
In coercion, the type of operands converts according to the type
of operator. For example, consider the expression 2 * 3.14. Now
2 is an integer and 3.14 is a floating-point number. The coercion
specified by the language converts. integer 2 to floating-point
2.0Now both the operands are floating-point, the compiler will
perform the floating-point operation. This operation will provide a
floating-point resultant.

o Overloading
We have studied the concept of overloading in Java. For
example, the operator ‘+’ if applied to the integer performs the
addition of two integers. And if applied to the string performs
concatenation of the two strings.
o Thus, the meaning of the operator changes according to the type
of operands specified.

The intermediate representation can be of various forms. V v imp


Three ways of intermediate representation:
i) Syntax Tree
ii) Postfix notation
iii) Three Address Code (TAC)
CSE Department
Prof. Rekhanjali Sahoo

A syntax tree depicts the natural hierarchical structure of a source


program as in the following figure.
A=B*C
=
A *
B C

A syntax tree for the assignment statement a=b*-c+b*-c similar to


a=b*(-c)+ b*(-c)

CSE Department
Prof. Rekhanjali Sahoo

Postfix notation is a linearized representation of a syntax tree;

A=B*C

 A=BC*
 ABC*= (POSTFIX EXPR.)

It is a list of the nodes of the in which a node(operator) appears immediately after


its children(operands). The postfix notation for a=b*-c+b*-c the syntax tree in the
fig is :

a=b*-c+b*-c

 a=b*c-+ b*c-
 a=bc-*+ bc-*
 a= bc-*bc-*+
 abc-*bc-*+= (postfix exprsn.)

What is Three Address Code (TAC)?

o Three-address code is an intermediate code. It is used by the


optimizing compilers.
o Each Three address code instruction has at most three operands. It is
a combination of assignment and a binary operator.

Three-address code is a sequence of statements of the general form


X = Y Op Z ex a=b*c, a=-d

where X, Y, and Z are names, constants, or compiler-generated


temporaries; op stands for any operator, such as a fixed- or floating-point
arithmetic operator, or a logical operator on Boolean-valued data.

Ex: A=B-C, X=-Y


CSE Department
Prof. Rekhanjali Sahoo

In TAC, the given expression is broken down into several separate


instructions. These instructions can easily translate into assembly language.

Thus a source language expression like E=x+y*z might be translated into a


sequence

t1 = y * z
t2 = x + t1
E=t2

Where t1 and t2 are compiler-generated temporary names.

Ex:

a:=b*(-c)+b*(-c)
Intermediate code using Syntax for the above arithmetic expression
t1 := -c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5

Ex: a=((b*c)-(d/e))+f
t1=e
t2=d/t1
CSE Department
Prof. Rekhanjali Sahoo

t3=c
t4=b*t3
t5=t4-t2
t6=t5+f
a=t6

All of the above are in TAC representation

The reason for the term “Three-address code” is that each statement
usually contains three addresses, two for the operands and one for the
result.

CSE Department
Prof. Rekhanjali Sahoo

Implementation of Three Address Code (TAC)– V v imp


There are 3 representations of three address code namely
1. Quadruple
2. Triples
3. Indirect Triples

1. Quadruple – It is a structure which consists of 4 fields namely op,


arg1, arg2 and result. op denotes the operator and arg1 and arg2
denotes the two operands and result is used to store the result of the
expression.

Example – Consider expression a = b * – c + b * – c. The three address


code is:
CSE Department
Prof. Rekhanjali Sahoo

t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5

Advantage –
CSE Department
Prof. Rekhanjali Sahoo

 Easy to rearrange code for global optimization.


 One can quickly access value of temporary variables using symbol table.
Disadvantage –
 Contain lot of temporaries.
 Temporary variable creation increases time and space complexity.

2. Triples – This representation doesn’t make use of extra temporary


variable to represent a single operation instead when a reference to
another triple’s value is needed, a pointer to that triple is used. So, it
consist of only three fields namely op, arg1 and arg2.

Example – Consider the same expression a = b * – c + b * – c. The


three address code is:

CSE Department
Prof. Rekhanjali Sahoo

Disadvantage –
 Temporaries are implicit and difficult to rearrange code.
 It is difficult to optimize because optimization involves moving
intermediate code. When a triple is moved, any other triple referring to it
must be updated also. With help of pointer one can directly access
symbol table entry.

CSE Department
Prof. Rekhanjali Sahoo

3. Indirect Triples – This representation makes use of pointer to the listing


of all references to computations which is made separately and stored. Its
similar in utility as compared to quadruple representation but requires less
space than it. Temporaries are implicit and easier to rearrange code.
Example – Consider the same expression a = b * – c + b * – c

--------------------------------------------------X----------------------------------------------
Question – Write quadruple, triples and indirect triples for following
expression : (x + y) * (y + z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y t2 = y + z

CSE Department
Prof. Rekhanjali Sahoo

t3 = t1 * t2

t4 = t1 + z

t5 = t3 + t4

(x + y) * (y + z) + (x + y + z)

(x + y) * (y + z) + (x + y + z)

CSE Department
Prof. Rekhanjali Sahoo

Advantages of Intermediate Code Generation:

Easier to implement: Intermediate code generation can simplify the code generation process by
reducing the complexity of the input code, making it easier to implement.
Facilitates code optimization: Intermediate code generation can enable the use of various code
optimization techniques, leading to improved performance and efficiency of the generated code.
Platform independence: Intermediate code is platform-independent, meaning that it can be
translated into machine code or bytecode for any platform.
Code reuse: Intermediate code can be reused in the future to generate code for other platforms or
languages.
Easier debugging: Intermediate code can be easier to debug than machine code or bytecode, as it
is closer to the original source code.

Disadvantages of Intermediate Code Generation:

Increased compilation time: Intermediate code generation can significantly increase the
compilation time, making it less suitable for real-time or time-critical applications.
Additional memory usage: Intermediate code generation requires additional memory to store the
intermediate representation, which can be a concern for memory-limited systems.
Increased complexity: Intermediate code generation can increase the complexity of the compiler
design, making it harder to implement and maintain.

----------------------------------------------------------------------X-------------------------------------------------------------------

CSE Department
Prof. Rekhanjali Sahoo

Backpatching
It is basically a process of fulfilling unspecified information. This
information is of labels.

It may indicate the address of the Label in goto statements while producing
TACs for the given expressions. Here basically two passes are used
because assigning the positions of these label statements in one pass is
quite challenging. It can leave these addresses unidentified in the first
pass and then populate them in the second round.
Ex:
x < 100 || x > 200 && x! = y either evaluates True / False
-------------------------------------------------------------------------------

CSE Department
Prof. Rekhanjali Sahoo

CSE Department
Prof. Rekhanjali Sahoo

Unit-4 CODE GENERATION


Issues in the design of a code generator

The following issue arises during the code generation phase:


Input to code generator – The input to the code generator is the
intermediate code generated by the front end, along with information in the
symbol table that determines the run-time addresses of the data objects
CSE Department
Prof. Rekhanjali Sahoo

denoted by the names in the intermediate representation. Intermediate


codes may be represented mostly in quadruples, triples, indirect triples,
Postfix notation, syntax trees, DAGs, etc. The code generation phase just
proceeds on an assumption that the input is free from all syntactic and
semantic errors, the necessary type checking has taken place and the
type-conversion done wherever necessary.

Target program: The target program is the output of the code generator.
The output may be absolute machine language, relocatable machine
language, or assembly language.
 Absolute machine language as output has the advantages that it
can be placed in a fixed memory location and can be
immediately executed. For example, WATFIV is a compiler that
produces the absolute machine code as output.
 Relocatable machine language as an output allows subprograms
and subroutines to be compiled separately. Relocatable object
modules can be linked together and loaded by a linking loader.
But there is added expense of linking and loading.
 Assembly language as output makes the code generation easier.
We can generate symbolic instructions and use the macro-
facilities of assemblers in generating code. And we need an
additional assembly step after code generation.

 Memory Management – Mapping the names in the source program to


the addresses of data objects is done by the front end and the code
generator. A name in the three address statements refers to the symbol

CSE Department
Prof. Rekhanjali Sahoo

table entry for the name. Then from the symbol table entry, a relative
address can be determined for the name.

 Instruction selection – Selecting the best instructions will improve the


efficiency of the program. It includes the instructions that should be
complete and uniform. But if we do not care about the efficiency of the
target program then instruction selection is straightforward. For
example, the respective three-address statements would be translated
into the latter code sequence as shown below:

P = Q+R
S = P+T

MOV Q, R0
ADD R, R0
STA R0, P
MOV P, R0
ADD T, R0
MOV R0, S

Here the fourth statement (MOV P, R0) is redundant as the value of the P
is loaded again in that statement that just has been stored in the previous
statement. It leads to an inefficient code sequence.

 Register allocation issues – Use of registers make the computations faster in


comparison to that of memory, so efficient utilization of registers is important.

CSE Department
Prof. Rekhanjali Sahoo

Basic blocks and flow graphs


Basic Blocks-

Basic block is a set of statements that always executes in a sequence one


after the other.
Example Of Basic Block-

Three Address Code for the expression a = b + c + d is-


TAC x=y op z

Here,
 All the statements execute in a sequence one after the other.
 Thus, they form a basic block.

Example Of Not A Basic Block-

Three Address Code for the expression If A<B then 1 else 0 is-

CSE Department
Prof. Rekhanjali Sahoo

Here,
 The statements do not execute in a sequence one after the other.
 Thus, they do not form a basic block.
The characteristics of basic blocks are-
 They do not contain any kind of unconditional jump statements in them.

 There is no possibility of branching or getting halt in the middle.

 All the statements execute in the same order they appear.

 They do not lose the flow control of the program.

Partitioning Intermediate Code Into Basic Blocks-


 Our first job is to partition a sequence of three-address instructions
into basic blocks.
 We begin a new basic block with the first instruction and keep adding
instructions until we meet either a jump, a conditional jump, or a label
on the following instruction.
 In the absence of jumps and labels, control proceeds sequentially from
one instruction to the next.
Any given code can be partitioned into basic blocks using the following rules-

Rule-01: Determining Leaders-

Following statements of the code are called as Leaders–

CSE Department
Prof. Rekhanjali Sahoo

 First statement of the code.


 Statement that is a target of the conditional or unconditional goto statement.
(Statement L is a leader if there is an conditional or unconditional goto statement like:
if....goto L or goto L)

 Statement that appears immediately after a goto statement.

Rule-02: Determining Basic Blocks-

 All the statements that follow the leader (including the leader) till the next
leader appears form one basic block.
 The first statement of the code is called as the first leader.
 The block containing the first leader is called as Initial block.

Problem-01:

Compute the basic blocks for the given three address statements-

(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I
(6) T3 = T2[T1]
CSE Department
Prof. Rekhanjali Sahoo

(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)

Solution-
We have-
 PROD = 0 is a leader since first statement of the code is a leader.
 T1 = 4 x I is a leader since target of the conditional goto statement is a
leader.

Now, the given code can be partitioned into two basic blocks as-

CSE Department
Prof. Rekhanjali Sahoo

Problem-02:

Draw a flow graph for the three address statements given in problem-01.

Solution-

 Firstly, we compute the basic blocks (already done above).


 Secondly, we assign the flow control information.

The required flow graph is-

CSE Department
Prof. Rekhanjali Sahoo

o Block B1 is the initial node.


Block B2 immediately follows
B1, so from B1 to B2 there is an
edge.
o The target of jump from last
statement of B2 is the first
statement B2, so from B2 to B2
there is an edge.
o B2 is a successor of B1 and B1
is the predecessor of B2.

DAG representation for basic blocks


A DAG for basic block is a directed acyclic graph with the following labels
on nodes:

1. The leaves of graph are labelled by unique identifier and that identifier
can be variable names or constants.
2. Interior nodes of the graph are labelled by an operator symbol.
3. Nodes are also given a sequence of identifiers for labels to store the
computed value.

o DAGs are a type of data structure. It is used to implement


transformations on basic blocks.
o DAG provides a good way to determine the common sub-expression.
o It gives a picture representation of how the value computed by the
statement is used in subsequent statements.
AD

CSE Department
Prof. Rekhanjali Sahoo

Method:
Step 1:
If y operand is undefined then create node(y). 1. Case (i) x:= y OP z
If z operand is undefined then for case(i) create 2. Case (ii) x:= OP y
node(z). Case (iii) x:= y

Step 2:
For case(i), create node(OP) whose right child is
node(z) and left child is node(y).

For case(ii), check whether there is node(OP) with


one child node(y).

For case(iii), node n will be node(y).

Output:
For node(x) delete x from the list of identifiers. Append x to attached
identifiers list for the node n found in step 2. Finally set node(x) to n.

Example:
Consider the following three address statement:

1. S1:= 4 * i
2. S2:= a[S1]
3. S3:= 4 * i
4. S4:= b[S3]
5. S5:= s2 * S4
6. S6:= prod + S5
7. Prod:= s6
8. S7:= i+1
9. i := S7
10. if i<= 20 goto (1)

Stages in DAG Construction:

CSE Department
Prof. Rekhanjali Sahoo

1. S1:= 4 * i
2. S2:= a[S1]
3. S3:= 4 * i
4. S4:= b[S3]
5. S5:= S2 * S4
6. S6:= prod + S5
7. Prod:= S6
8. S7:= i+1
9. i := S7
10. if i<= 20 goto (1)

CSE Department
Prof. Rekhanjali Sahoo

1. S1:= 4 * i
2. S2:= a[S1]
3. S3:= 4 * i
4. S4:= b[S3]
5. S5:= S2 * S4
6. S6:= prod + S5
7. Prod:= S6
8. S7:= i+1
9. i := S7
10. if i<= 20 goto (1)

1. S1:= 4 * i
2. S2:= a[S1]
3. S3:= 4 * i
4. S4:= b[S3]
5. S5:= S2 * S4
6. S6:= prod + S5
7. Prod:= S6
8. S7:= i+1
9. i := S7
10. if i<= 20 goto (1)

CSE Department
Prof. Rekhanjali Sahoo

Rearranged basic block:


Now t1 occurs immediately before t4.
t2 : = c + d
t3 : = e - t2
t1 : = a + b
t4 : = t1 - t3

CSE Department
Prof. Rekhanjali Sahoo

CSE Department

You might also like