CD Paper Solution 2022-23

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Printed Pages:02 Sub Code:KCS-502

Paper Id: 231842 Roll No.

B. TECH.
(SEM V) THEORY EXAMINATION 2022-23
COMPILER DESIGN

Time: 3 Hours Total Marks: 100


Note: Attempt all Sections. If require any missing data; then choose suitably.

SECTION A

1. Attempt all questions in brief. 2 x 10 = 20


(a) How will you group the phases of compiler?
Ans. A compiler is software that translates a high-level language into machine-
understandable form. Typically, a compiler is made up of six states and the input code in
high-level language also known as Source code passes through each state one by one, each
state processes the code, resulting in machine-understandable code or object code as an
output.
These six states of the compiler are further divided into phases on the basis of one-pass or
two-pass compiler, the state is as follows:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation

(b) Mention the role of semantic analysis.


Ans: Semantic Analysis is the third phase of Compiler. Semantic Analysis makes sure
that declarations and statements of program are semantically correct. It is a collection of
procedures which is called by parser as and when required by grammar. Both syntax tree
of previous phase and symbol table are used to check the consistency of the given
code. Type checking is an important part of semantic analysis where compiler makes
sure that each operator has matching operands.
Semantic Analyzer:
It uses syntax tree and symbol table to check whether the given program is semantically
consistent with language definition. It gathers type information and stores it in either
syntax tree or symbol table. This type information is subsequently used by compiler
during intermediate-code generation.

(c) What are the various parts in LEX program?


o Ans: Lex is a program that generates lexical analyzer. It is used with YACC parser
generator.
o The lexical analyzer is a program that transforms an input stream into a sequence of
tokens.
o It reads the input stream and produces the source code as output through
implementing the lexical analyzer in the C program.
The function of Lex is as follows:
o Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex
compiler runs the lex.1 program and produces a C program lex.yy.c.
o Finally C compiler runs the lex.yy.c program and produces an object program a.out.
o a.out is lexical analyzer that transforms an input stream into a sequence of tokens.

Lex file format

A Lex program is separated into three sections by %% delimiters. The formal of Lex source
is as follows:

1. { definitions }
2. %%
3. { rules }
4. %%
5. { user subroutines }

(d) Differentiate Parse tree and Syntax tree with an example.


Ans: Parse Tree
Parse tree is a hierarchical structure that defines the derivation of the grammar to yield input
strings. In parsing, the string is derived using the start symbol. The root of the parse tree is
that start symbol. It is the graphical description of symbols that can be terminals or non-
terminals. Parse tree follows the precedence of operators. The deepest sub-tree traversed
first. Therefore, the operator in the parent node has less precedence over the operator in the
sub-tree.
A Parse Tree for a CFG G = (V, Σ, P, S) is a tree satisfying the following conditions −

• Root has the label S, where S is the start symbol.


• Each vertex of the parse tree has a label which can be a variable (V),
terminal (Σ) or ε.
• If A → C1, C2 … … . Cn is a production, then C1, C2 … … . Cn are children
of node labeled A.
• Leaf Nodes are terminal (Σ), and Interior nodes are variable (V).
• The label of an internal vertex is always a variable.
• If a vertex A has k children with labels A1, A2 … … . Ak, then A → A1,
A2 … … . Ak will be production in context-free grammar G.
Syntax Tree
A syntax tree is a tree that displays the syntactic structure of a program while ignoring
inappropriate analysis present in a parse tree. Thus, the syntax tree is nothing more than a
condensed form of the parse tree. The operator and keyword nodes of a parse tree are shifted
to their parent and a group of individual production is replaced by an individual link. For
example, for a given parse tree of string id + id * id.
Syntax Tree for the expression is as follows −

Example− Construct
• Parse Tree
• Syntax Tree
• Annotated for complete parse tree for the input string 1 * 2 + 3 by using any
grammar you know.
Solution

(e) Give the properties of intermediate representation.


Ans: Intermediate Representation(IR), as the name suggests, is any representation of a
program between the source and target languages. The intermediate form of the program
that is being compiled is the central data structure in a compiler. A compiler may have a
single IR or a series of IRs. The decisions that are made during the design of IR affect the
efficiency and speed of the compiler.

Properties of IRs:

The priorities of different properties across all compilers are not uniform.
The below five are the properties of IRs:
1. Ease of generation
2. Ease of manipulation
3. Freedom of expression
4. Size of the procedure
5. Level of abstraction

(f) Differentiate between LR and LL parsers.


Ans:
LL Parser LR Parser

First L of LL is for left to right and second L is for L of LR is for left to right and R is for
leftmost derivation. rightmost derivation.

It follows the left most derivation. It follows reverse of right most derivation.

Using LL parser parser tree is constructed in top Parser tree is constructed in bottom up
down manner. manner.

In LL parser, non-terminals are expanded. In LR parser, terminals are compressed.

Starts with the start symbol(S). Ends with start symbol(S).

Ends when stack used becomes empty. Starts with an empty stack.

Pre-order traversal of the parse tree. Post-order traversal of the parser tree.

Terminal is read before pushing into the


Terminal is read after popping out of stack.
stack.

It may use backtracking or dynamic programming. It uses dynamic programming.

LL is easier to write. LR is difficult to write.

Example: LR(0), SLR(1), LALR(1),


Example: LL(0), LL(1)
CLR(1)

(g) What is phrase level error recovery?

Ans: Phrase Level Recovery:

In this strategy, on discovering an error, parser performs original correction on the


remaining input. It can replace a prefix of the remaining input with some string. This
actually helps the parser to continue its job. The original correction can be replacing the
comma with semicolons, omission of semicolons, or, fitting missing semicolons. This type
of original correction is decided by the compiler developer.
Examples:

int a,b
// AFTER RECOVERY:
int a,b; //Semicolon is added by the compiler

(h) Discuss the capabilities of CFG.


Ans: Capabilities of CFG

There are the various capabilities of CFG:

o Context free grammar is useful to describe most of the programming languages.


o If the grammar is properly designed then an efficientparser can be constructed
automatically.
o Using the features of associatively & precedence information, suitable grammars for
expressions can be constructed.
o Context free grammar is capable of describing nested structures like: balanced
parentheses, matching begin-end, corresponding if-then-else's & so on.

(i). Define loop jamming.


Ans: Loop Jamming:
Loop jamming is the combining the two or more loops in a single loop. It reduces the
time taken to compile the many number of loops.
Example:
Initial Code:

for(int i=0; i<5; i++)


a = i + 5;
for(int i=0; i<5; i++)
b = i + 10;

Optimized code:
for(int i=0; i<5; i++)
{
a = i + 5;
b = i + 10;
}
(j). What is induction variable?
Ans: Loops are well known targets for optimization since they execute repeatedly and
significant execution time is spent in loop bodies. The class of loop optimizations
which we're considering in this post are centered on special variables called induction
variables (IVs). An induction variable is any variable whose value can be represented
as a function of: loop invariants; the number of loop iterations that have executed; and
other induction variables.
SECTION B

2. Attempt any three of the following: 10 x 3 = 30


(a) Write SDD to produce three-address code for Boolean
expressions and obtain the three-address code for the statement
given below:

while a < b
doif c < d
then
x=y*
z else
x=y+z
Ans: Boolean Expression
The translation of conditional statements such as if-else statements and while-do
statements is associated with Boolean expression's translation. The main use of the
Boolean expression is the following:

• Boolean expressions are used as conditional expressions in statements that alter the
flow of control.
• A Boolean expression can compute logical values, true or false.

Boolean expression is composed of Boolean operators like &&, ||, !, etc. applied to the
elements that are Boolean or relational expressions. E1 rel E2 is the form of relational
expressions.
Let us consider the following grammars:
B => B1 | | B2
B => B1 && B2 |
B => !B1
B => (B)
B => E1 rel E2
B => true
B => false
If we compute that B1 is true in the first expression, then the entire expression will be true.
We don’t need to compute B2. In the second expression, if B1 is false, then the entire
expression is false.
The comparison operators <, <=, =, !=, >, or => is represented by rel.op.
We also assume that || and && are left-associative. || has the lowest precedence and then
&&, and !.
PRODUCTION SEMANTIC R RULES

B1.true = B.true B1.false = newlabel () B2.true = B.true B2.false = B.false B.code = B1.c
B => B1 | | B2
label(B1.false) || B2.code

B1.true = newlabel () B1.false = B.false B2.true = B.true B2.false = B.false B.code =


B => B1 && B2
label( B1.true) | | B2.code

B => !B1 B1.true = B.false B1.false = B.true B.code = B1.code


B.code = E1.code | | E2.code | | gen(‘if’ E1.addr rel.op E2.addr ‘goto’ B.true) | | gen(
B => E1 rel E2
B.false)

B => true B.code = gen(‘goto’ B.true )

B => false B.code = gen(‘goto’ B.false )


The below example can generate the three address code using the above translation
scheme:
if ( x < 100 || x > 200 && x ! = y ) x = 0;
if x < 100 goto L2
goto L3
L3: if x > 200 goto L4
goto L1
L4: if x != y goto L 2
goto L1
L2: x = 0
L1:

Numerical:

(b) Discuss the stack allocation and heap allocation strategies of the
runtimeenvironment with an example.
Ans: Stack Allocation: The allocation happens on contiguous blocks of memory. We call
it a stack memory allocation because the allocation happens in the function call stack. The
size of memory to be allocated is known to the compiler and whenever a function is called,
its variables get memory allocated on the stack. And whenever the function call is over, the
memory for the variables is de-allocated. This all happens using some predefined routines
in the compiler. A programmer does not have to worry about memory allocation and de-
allocation of stack variables. This kind of memory allocation is also known as Temporary
memory allocation because as soon as the method finishes its execution all the data
belonging to that method flushes out from the stack automatically. This means any value
stored in the stack memory scheme is accessible as long as the method hasn’t completed its
execution and is currently in a running state.
Key Points:
• It’s a temporary memory allocation scheme where the data members are
accessible only if the method( ) that contained them is currently running.
• It allocates or de-allocates the memory automatically as soon as the
corresponding method completes its execution.
• We receive the corresponding error Java. lang. StackOverFlowError by JVM, If
the stack memory is filled completely.
• Stack memory allocation is considered safer as compared to heap memory
allocation because the data stored can only be accessed by the owner thread.
• Memory allocation and de-allocation are faster as compared to Heap-memory
allocation.
• Stack memory has less storage space as compared to Heap-memory.

• C++

int main()
{
// All these variables get memory
// allocated on stack
int a;
int b[10];
int n = 20;
int c[n];
}

Heap Allocation: The memory is allocated during the execution of instructions written by
programmers. Note that the name heap has nothing to do with the heap data structure. It is
called a heap because it is a pile of memory space available to programmers to allocate and
de-allocate. Every time when we made an object it always creates in Heap-space and the
referencing information to these objects is always stored in Stack-memory. Heap memory
allocation isn’t as safe as Stack memory allocation because the data stored in this space is
accessible or visible to all threads. If a programmer does not handle this memory well,
a memory leak can happen in the program.
The Heap-memory allocation is further divided into three categories:- These three
categories help us to prioritize the data(Objects) to be stored in the Heap-memory or in
the Garbage collection.
• Young Generation – It’s the portion of the memory where all the new
data(objects) are made to allocate the space and whenever this memory is
completely filled then the rest of the data is stored in Garbage collection.
• Old or Tenured Generation – This is the part of Heap-memory that contains
the older data objects that are not in frequent use or not in use at all are placed.
• Permanent Generation – This is the portion of Heap-memory that contains the
JVM’s metadata for the runtime classes and application methods.
Key Points:
• We receive the corresponding error message if Heap-space is entirely full, java.
lang.OutOfMemoryError by JVM.
• This memory allocation scheme is different from the Stack-space allocation,
here no automatic de-allocation feature is provided. We need to use a Garbage
collector to remove the old unused objects in order to use the memory
efficiently.
• The processing time(Accessing time) of this memory is quite slow as compared
to Stack-memory.
• Heap memory is also not as threaded-safe as Stack-memory because data stored
in Heap-memory are visible to all threads.
• The size of the Heap-memory is quite larger as compared to the Stack-memory.
• Heap memory is accessible or exists as long as the whole application(or java
program) runs.
• CPP

int main()
{
// This memory for 10 integers
// is allocated on heap.
int *ptr = new int[10];
}

Intermixed example of both kinds of memory allocation Heap and Stack in java:
• Java
• C++
class Emp {
int id;
String emp_name;

public Emp(int id, String emp_name) {


this.id = id;
this.emp_name = emp_name;
}
}

public class Emp_detail {


private static Emp Emp_detail(int id, String emp_name) {
return new Emp(id, emp_name);
}

public static void main(String[] args) {


int id = 21;
String name = "Maddy";
Emp person_ = null;
person_ = Emp_detail(id, name);
}
}

Following are the conclusions on which we’ll make after analyzing the above
example:
• As we start execution of the have program, all the run-time classes are stored in
the Heap-memory space.
• Then we find the main() method in the next line which is stored in the stack
along with all its primitive(or local) and the reference variable Emp of type
Emp_detail will also be stored in the Stack and will point out to the
corresponding object stored in Heap memory.
• Then the next line will call to the parameterized constructor Emp(int, String)
from main( ) and it’ll also allocate to the top of the same stack memory block.
This will store:
• The object reference of the invoked object of the stack memory.
• The primitive value(primitive data type) int id in the stack memory.
• The reference variable of the String emp_name argument will point
to the actual string from the string pool into the heap memory.
• Then the main method will again call to the Emp_detail() static method, for
which allocation will be made in stack memory block on top of the previous
memory block.
• So, for the newly created object Emp of type Emp_detail and all instance
variables will be stored in heap memory.
Pictorial representation as shown in Figure.1 below:
Fig.1

Key Differences Between Stack and Heap Allocations

1. In a stack, the allocation and de-allocation are automatically done by the


compiler whereas, in heap, it needs to be done by the programmer manually.
2. Handling the Heap frame is costlier than handling the stack frame.
3. Memory shortage problem is more likely to happen in stack whereas the main
issue in heap memory is fragmentation.
4. Stack frame access is easier than the heap frame as the stack has a small region
of memory and is cache-friendly but in the case of heap frames which are
dispersed throughout the memory so it causes more cache misses.
5. A stack is not flexible, the memory size allotted cannot be changed whereas a
heap is flexible, and the allotted memory can be altered.
6. Accessing the time of heap takes is more than a stack.

(c) What do you mean by attributed grammars? Discuss the


translationscheme for converting an infix expression to its
equivalent postfix form.
Ans: Parser uses a CFG(Context-free-Grammar) to validate the input string and produce
output for the next phase of the compiler. Output could be either a parse tree or an
abstract syntax tree. Now to interleave semantic analysis with the syntax analysis phase
of the compiler, we use Syntax Directed Translation.

Conceptually, with both syntax-directed definition and translation schemes, we parse the
input token stream, build the parse tree, and then traverse the tree as needed to evaluate
the semantic rules at the parse tree nodes. Evaluation of the semantic rules may generate
code, save information in a symbol table, issue error messages, or perform any other
activities. The translation of the token stream is the result obtained by evaluating the
semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic
analysis. SDT involves passing information bottom-up and/or top-down to the parse tree
in form of attributes attached to the nodes. Syntax-directed translation rules use 1) lexical
values of nodes, 2) constants & 3) attributes associated with the non-terminals in their
definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or syntax
tree and compute the values of attributes at the nodes of the tree by visiting them in some
order. In many cases, translation can be done during parsing without building an explicit
tree.
Example

E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT rules to
this grammar, in order to pass some information up the parse tree and check for semantic
errors, if any. In this example, we will focus on the evaluation of the given expression, as
we don’t have any semantic assertions to check in this very basic example.

E -> E+T { E.val = E.val + T.val } PR#1


E -> T { E.val = T.val } PR#2
T -> T*F { T.val = T.val * F.val } PR#3
T -> F { T.val = F.val } PR#4
F -> INTLIT { F.val = INTLIT.lexval } PR#5
For understanding translation rules further, we take the first SDT augmented to [ E ->
E+T ] production rule. The translation rule in consideration has val as an attribute for
both the non-terminals – E & T. Right-hand side of the translation rule corresponds to
attribute values of the right-side nodes of the production rule and vice-versa.
Generalizing, SDT are augmented rules to a CFG that associate 1) set of attributes to
every node of the grammar and 2) a set of translation rules to every production rule using
attributes, constants, and lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree
corresponding to S would be
To evaluate translation rules, we can employ one depth-first search traversal on the parse
tree. This is possible only because SDT rules don’t impose any specific order on
evaluation until children’s attributes are computed before parents for a grammar having
all synthesized attributes. Otherwise, we would have to figure out the best-suited plan to
traverse through the parse tree and evaluate all the attributes in one or more traversals.
For better understanding, we will move bottom-up in the left to right fashion for
computing the translation rules of our example.

(d) Construct the NFA and DFA for the following regular expression.

(0+1)*(00+11)(0+1
)*
Ans:
(e) Explain the lexical analysis and syntax analysis phases of the
compiler with a suitable example. Explain the reporting errors in
these two phases as well.
1. Ans: Lexical Analyzer –
It is also called a scanner. It takes the output of the preprocessor (which
performs file inclusion and macro expansion) as the input which is in a pure
high-level language. It reads the characters from the source program and
groups them into lexemes (sequence of characters that “go together”). Each
lexeme corresponds to a token. Tokens are defined by regular expressions
which are understood by the lexical analyzer. It also removes lexical errors
(e.g., erroneous characters), comments, and white space.
2. Syntax Analyzer – It is sometimes called a parser. It constructs the parse tree.
It takes all the tokens one by one and uses Context-Free Grammar to construct
the parse tree.
Why Grammar?
The rules of programming can be entirely represented in a few productions.
Using these productions we can represent what the program actually is. The
input has to be checked whether it is in the desired format or not.
The parse tree is also called the derivation tree. Parse trees are generally
constructed to check for ambiguity in the given grammar. There are certain
rules associated with the derivation tree.
• Any identifier is an expression
• Any number can be called an expression
• Performing any operations in the given expression will always
result in an expression. For example, the sum of two expressions is
also an expression.
• The parse tree can be compressed to form a syntax tree.

Types of Lexical Error:

Types of lexical error that can occur in a lexical analyzer are as follows:
1. Exceeding length of identifier or numeric constants.
Example:

#include <iostream>
using namespace std;

int main() {

int a=2147483647 +1;


return 0;
}

This is a lexical error since signed integer lies between −2,147,483,648 and 2,147,483,647
2. Appearance of illegal characters
Example:

#include <iostream>
using namespace std;

int main() {

printf("Geeksforgeeks");$
return 0;
}

This is a lexical error since an illegal character $ appears at the end of the statement.
3. Unmatched string
Example:

#include <iostream>
using namespace std;

int main() {
/* comment
cout<<"GFG!";
return 0;
}

This is a lexical error since the ending of comment “*/” is not present but the beginning is
present.
4. Spelling Error

#include <iostream>
using namespace std;

int main() {

int 3num= 1234; /* spelling error as identifier


cannot start with a number*/
return 0;
}

5. Replacing a character with an incorrect character.

#include <iostream>
using namespace std;

int main() {

int x = 12$34; /*lexical error as '$' doesn't


belong within 0-9 range*/
return 0;
}

Other lexical errors include


6. Removal of the character that should be present.

#include <iostream> /*missing 'o' character


hence lexical error*/
using namespace std;

int main() {

cout<<"GFG!";
return 0;
}

7. Transposition of two characters.

#include <iostream>
using namespace std;

int mian()
{
/* spelling of main here would be treated as an lexical
error and won't be considered as an identifier,
transposition of character 'i' and 'a'*/
cout << "GFG!";
return 0;
}

Syntax Error

During the syntax analysis phase, this type of error appears. Syntax error is found during the
execution of the program.

Some syntax error can be:

o Error in structure
o Missing operators
o Unbalanced parenthesis

When an invalid calculation enters into a calculator then a syntax error can also occurs. This
can be caused by entering several decimal points in one number or by opening brackets
without closing them.

For example 1: Using "=" when "==" is needed.ip 10s

1. 16 if (number=200)
2. 17 count << "number is equal to 20";
3. 18 else
4. 19 count << "number is not equal to 200"

The following warning message will be displayed by many compilers:

Syntax Warning: assignment operator used in if expression line 16 of program firstprog.cpp

In this code, if expression used the equal sign which is actually an assignment operator not
the relational operator which tests for equality.

Due to the assignment operator, number is set to 200 and the expression number=200 are
always true because the expression's value is actually 200. For this example the correct code
would be:

1. 16 if (number==200)

Example 2: Missing semicolon:

1. int a = 5 // semicolon is missing

Compiler message:

1. ab.java:20: ';' expected


2. int a = 5
Example 3: Errors in expressions:

1. x = (3 + 5; // missing closing parenthesis ')'


2. y = 3 + * 5; // missing argument between '+' and '*'

SECTION C
3. Attempt any one part of the following: 10 x 1 = 10
(a) Construct the CLR parse table for the following Grammar:
A BB
B cB
B d
Ans:
(b) Construct the SLR parsing table for the following Grammar.
S→0S0
S→1S1
S→ 10
Ans:
4. Attempt any one part of the following: 10 x 1 = 10
(a) What is back patching. Generate three address code for the
following Boolean expression using back patching:
a < b or c > d and e < f

Ans: One-pass code generation using backpatching:

In a single pass, backpatching may be used to create a boolean expressions program as well
as the flow of control statements. The synthesized properties truelist and falselist of non -
terminal B are used to handle labels in jumping code for Boolean statements. The label to
which control should go if B is true should be added to B.truelist, which is a list of a jump
or conditional jump instructions. B.falselist is the list of instructions that eventually get the
label to which control is assigned when B is false. The jumps to true and false exist, as
well as the label field, are left blank when the program is generated for B. The lists
B.truelist and B.falselist, respectively, contain these early jumps.
A statement S, for example, has a synthesized attribute S.nextlist, which indicates a list of
jumps to the instruction immediately after the code for S. It can generate instructions into
an instruction array, with labels serving as indexes. We utilize three functions to modify
the list of jumps:
• Makelist (i): Create a new list including only i, an index into the array of
instructions and the makelist also returns a pointer to the newly generated list.
• Merge(p1,p2): Concatenates the lists pointed to by p1, and p2 and returns a
pointer to the concatenated list.
• Backpatch (p, i): Inserts i as the target label for each of the instructions on
the record pointed to by p.

Backpatching for Boolean Expressions:

Using a translation technique, it can create code for Boolean expressions during bottom-
up parsing. In grammar, a non-terminal marker M creates a semantic action that picks up
the index of the next instruction to be created at the proper time.
For Example, Backpatching using boolean expressions production rules table:
Step 1: Generation of the production table

A < B OR C < D AND P < Q


Three address codes for the given example

Step 3: Now we will make the parse tree for the expression:

(b) What is top down parsing? What are the problems in top down
parsing?Explain each with suitable example.
Ans: Top down paring
o The top down parsing is known as recursive parsing or predictive parsing.
o Bottom up parsing is used to construct a parse tree for an input string.
o In the top down parsing, the parsing starts from the start symbol and transform it into
the input symbol.

Parse Tree representation of input string "acdb" is as follows:


Backward Skip 10sPlay VideoForward Skip10s
5. Attempt any one part of the following: 10 x 1 = 10
(a) What is an activation record? Draw diagram of general activation
record and explain the purpose of different fields of an activation
record.
Ans: Activation Record
o Control stack is a run time stack which is used to keep track of the live procedure
activations i.e. it is used to find out the procedures whose execution have not been
completed.
o When it is called (activation begins) then the procedure name will push on to the
stack and when it returns (activation ends) then it will popped.
o Activation record is used to manage the information needed by a single execution of
a procedure.
o An activation record is pushed into the stack when a procedure is called and it is
popped when the control returns to the caller function.

The diagram below shows the contents of activation records:


Return Value: It is used by calling procedure to return a value to calling procedure.

Actual Parameter: It is used by calling procedures to supply parameters to the called


procedures.

Control Link: It points to activation record of the caller.10s

Access Link: It is used to refer to non-local data held in other activation records.

Saved Machine Status: It holds the information about status of machine before the
procedure is called.

Local Data: It holds the data that is local to the execution of the procedure.

Temporaries: It stores the value that arises in the evaluation of an expression.

(b) How do we represent the scope information? Explain scope by


number and scope by location.
Ans: Representing Scope Information

In the source program, every name possesses a region of validity, called the scope of that
name.

The rules in a block-structured language are as follows:

1. If a name declared within block B then it will be valid only within B.


2. If B1 block is nested within B2 then the name that is valid for block B2 is also valid
for B1 unless the name's identifier is re-declared in B1.

o These scope rules need a more complicated organization of symbol table than a list
of associations between names and attributes.
o Tables are organized into stack and each table contains the list of names and their
associated attributes.
o Whenever a new block is entered then a new table is entered onto the stack. The new
table holds the name that is declared as local to this block.
o When the declaration is compiled then the table is searched for a name.
o If the name is not found in the table then the new name is inserted.
o When the name's reference is translated then each table is searched, starting from the
each table on the stack.

For example:
1. int x;
2. void f(int m) {
3. float x, y;
4. {
5. int i, j;
6. int u, v;
7. }
8. }
9. int g (int n)
10. {
11. bool t;
12. }
Fig: Symbol table organization that complies with static scope information rules

6. Attempt any one part of the following: 10 x 1 = 10


(a) Define Symbol table? Explain about the data structures used
forsymbol table.
Ans: Definition
The symbol table is defined as the set of Name and Value pairs.
Symbol Table is an important data structure created and maintained by the compiler in
order to keep track of semantics of variables i.e. it stores information about the scope and
binding information about names, information about instances of various entities such as
variable and function names, classes, objects, etc.
• It is built-in lexical and syntax analysis phases.
• The information is collected by the analysis phases of the compiler and is used
by the synthesis phases of the compiler to generate code.
• It is used by the compiler to achieve compile-time efficiency.
• It is used by various phases of the compiler as follows:-
1. Lexical Analysis: Creates new table entries in the table, for
example like entries about tokens.
2. Syntax Analysis: Adds information regarding attribute type,
scope, dimension, line of reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to
check for semantics i.e. to verify that expressions and assignments
are semantically correct(type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing
how much and what type of run-time is allocated and table helps in
adding temporary variable information.
5. Code Optimization: Uses information present in the symbol table
for machine-dependent optimization.
6. Target Code generation: Generates code by using address
information of identifier present in the table.
Symbol Table entries – Each entry in the symbol table is associated with attributes that
support the compiler in different phases.
Use of Symbol Table-
The symbol tables are typically used in compilers. Basically compiler is a program which
scans the application program (for instance: your C program) and produces machine
code.
During this scan compiler stores the identifiers of that application program in the symbol
table. These identifiers are stored in the form of name, value address, type.
Here the name represents the name of identifier, value represents the value stored in an
identifier, the address represents memory location of that identifier and type represents
the data type of identifier.
Thus compiler can keep track of all the identifiers with all the necessary information.
Items stored in Symbol table:
• Variable names and constants
• Procedure and function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages
Information used by the compiler from Symbol table:
• Data type and name
• Declaring procedures
• Offset in storage
• If structure or record then, a pointer to structure table.
• For parameters, whether parameter passing by value or by reference
• Number and type of arguments passed to function
• Base Address
Operations of Symbol table – The basic operations defined on a symbol table include:
Operations on Symbol Table :
Following operations can be performed on symbol table-
1. Insertion of an item in the symbol table.
2. Deletion of any item from the symbol table.
3. Searching of desired item from symbol table.
Implementation of Symbol table –
Following are commonly used data structures for implementing symbol table:-
1. List –
we use a single array or equivalently several arrays, to store names and their
associated information ,New names are added to the list in the order in which they are
encountered . The position of the end of the array is marked by the pointer available,
pointing to where the next symbol-table entry will go. The search for a name proceeds
backwards from the end of the array to the beginning. when the name is located the
associated information can be found in the words following next.

id1 info1 id2 info2 …….. id_n info_n

• In this method, an array is used to store names and associated information.


• A pointer “available” is maintained at end of all stored records and new
names are added in the order as they arrive
• To search for a name we start from the beginning of the list till available
pointer and if not found we get an error “use of the undeclared name”
• While inserting a new name we must ensure that it is not already present
otherwise an error occurs i.e. “Multiple defined names”
• Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
• The advantage is that it takes a minimum amount of space.
1. Linked List –
• This implementation is using a linked list. A link field is added to
each record.
• Searching of names is done in order pointed by the link of the link
field.
• A pointer “First” is maintained to point to the first record of the
symbol table.
• Insertion is fast O(1), but lookup is slow for large tables – O(n) on
average
2. Hash Table –
• In hashing scheme, two tables are maintained – a hash table and
symbol table and are the most commonly used method to
implement symbol tables.
• A hash table is an array with an index range: 0 to table size – 1.
These entries are pointers pointing to the names of the symbol
table.
• To search for a name we use a hash function that will result in an
integer between 0 to table size – 1.
• Insertion and lookup can be made very fast – O(1).
• The advantage is quick to search is possible and the disadvantage is
that hashing is complicated to implement.
3. Binary Search Tree –
• Another approach to implementing a symbol table is to use a
binary search tree i.e. we add two link fields i.e. left and right
child.
• All names are created as child of the root node that always follows
the property of the binary search tree.
• Insertion and lookup are O(log 2 n) on average.
Advantages of Symbol Table
1. The efficiency of a program can be increased by using symbol tables, which
give quick and simple access to crucial data such as variable and function
names, data kinds, and memory locations.
2. better coding structure Symbol tables can be used to organize and simplify
code, making it simpler to comprehend, discover, and correct problems.
3. Faster code execution: By offering quick access to information like memory
addresses, symbol tables can be utilized to optimize code execution by
lowering the number of memory accesses required during execution.
4. Symbol tables can be used to increase the portability of code by offering a
standardized method of storing and retrieving data, which can make it simpler
to migrate code between other systems or programming languages.
5. Improved code reuse: By offering a standardized method of storing and
accessing information, symbol tables can be utilized to increase the reuse of
code across multiple projects.
6. Symbol tables can be used to facilitate easy access to and examination of a
program’s state during execution, enhancing debugging by making it simpler
to identify and correct mistakes.
Disadvantages of Symbol Table
1. Increased memory consumption: Systems with low memory resources may
suffer from symbol tables’ high memory requirements.
2. Increased processing time: The creation and processing of symbol tables can
take a long time, which can be problematic in systems with constrained
processing power.
3. Complexity: Developers who are not familiar with compiler design may find
symbol tables difficult to construct and maintain.
4. Limited scalability: Symbol tables may not be appropriate for large-scale
projects or applications that require o the management of enormous amounts
of data due to their limited scalability.
5. Upkeep: Maintaining and updating symbol tables on a regular basis can be
time- and resource-consuming.
6. Limited functionality: It’s possible that symbol tables don’t offer all the
features a developer needs, and therefore more tools or libraries will be
needed to round out their capabilities.
Applications of Symbol Table
1. Resolution of variable and function names: Symbol tables are used to
identify the data types and memory locations of variables and functions as
well as to resolve their names.
2. Resolution of scope issues: To resolve naming conflicts and ascertain the
range of variables and functions, symbol tables are utilized.
3. Symbol tables, which offer quick access to information such as memory
locations, are used to optimize code execution.
4. Code generation: By giving details like memory locations and data kinds,
symbol tables are utilized to create machine code from source code.
5. Error checking and code debugging: By supplying details about the status
of a program during execution, symbol tables are used to check for faults and
debug code.
6. Code organization and documentation: By supplying details about a
program’s structure, symbol tables can be used to organize code and make it
simpler to understand.

(b) Explain the following:


(i) Copy Propagation
Ans: In compiler theory, copy propagation is the process of replacing the occurrences of
targets of direct assignments with their values.[1] A direct assignment is an instruction of
the form x = y , which simply assigns the value of y to x .
From the following code:

y=x
z=3+y

Copy propagation would yield:

z=3+x

(ii) Dead-Code Elimination


Ans: Dead code is a program snippet that is never executed or never reached in a
program. It is a code that can be efficiently removed from the program without affecting
any other part of the program. In case, a value is obtained and never used in the future, it
is also regarded as dead code. Consider the below dead code:
//Code
int x= a+23; //the variable x is never used
//in the program. Thus it is a dead code.
z=a+y;
printf("%d,%d".z,y);
//After Optimization
z=a+y;
printf("%d,%d".z,y);
Another example of dead code is assign a value to a variable and changing that value
just before using it. The previous value assignment statement is dead code. Such dead
code needs to be deleted in order to achieve optimization.
(iii) Code Motion
Ans: Many times, in a loop, statements that remain unchanged for every iteration are
included in the loop. Such statements are loop invariants and only result in the program
spending more time inside the loop. Code motion simply moves loop invariant code
outside the loop, reducing the time spent inside the loop. To understand this consider the
example below.
//Before code motion
p=100
for(i=0;i<p;i++)
{
a=b+40; //loop invariant code
if(p/a==0)
printf("%d",p);
}
// After code motion
p=100
a=b+40;
for(i=0;i<p;i++)
{
if(p/a==0)
printf("%d",p);
}
In the example, before optimizing, the loop invariant code was evaluated for every
iteration of the loop. Once code motion is applied, the frequency of evaluating loop
invariant code also decreases. Thus it is also called as Frequency Reduction. The
following is also an example for code motion.
//Before code motion
----;
while((x+y)>n)
{
----;
}
----;

// After code motion


----;
int t=x+y;
while(t>n)
{
----;
}
----;

(iv) Reduction in Strength.


Ans: It suggests replacing a costly operation like multiplication with a cheaper one.
Example:
a*4
after reduction
a<<2
It is an important optimization for programs where array accesses occur within loops and
should be used with integer operands only.

7. Attempt any one part of the following: 10 x 1 = 10


(a) Explain in the DAG representation of the basic block with example.
Ans: DAG representation for basic blocks

A DAG for basic block is a directed acyclic graph with the following labels on nodes:

1. The leaves of graph are labeled by unique identifier and that identifier can be variable
names or constants.
2. Interior nodes of the graph is labeled by an operator symbol.
3. Nodes are also given a sequence of identifiers for labels to store the computed value.

o DAGs are a type of data structure. It is used to implement transformations on basic


blocks.
o DAG provides a good way to determine the common sub-expression.
o It gives a picture representation of how the value computed by the statement is used
in subsequent statements.

Algorithm for construction of DAG

Input:It contains a basic block

Output: It contains the following information:


o Each node contains a label. For leaves, the label is an identifier.
o Each node contains a list of attached identifiers to hold the computed values.

1. Case (i) x:= y OP z


2. Case (ii) x:= OP y
3. Case (iii) x:= y
Method:

Step 1kip 10s

If y operand is undefined then create node(y).

If z operand is undefined then for case(i) create node(z).

Step 2:

For case(i), create node(OP) whose right child is node(z) and left child is node(y).

For case(ii), check whether there is node(OP) with one child node(y).

For case(iii), node n will be node(y).

Output:

For node(x) delete x from the list of identifiers. Append x to attached identifiers list for the
node n found in step 2. Finally set node(x) to n.

Example:

Consider the following three address statement:

1. S1:= 4 * i
2. S2:= a[S1]
3. S3:= 4 * i
4. S4:= b[S3]
5. S5:= s2 * S4
6. S6:= prod + S5
7. Prod:= s6
8. S7:= i+1
9. i := S7
10. if i<= 20 goto (1)
Stages in DAG Construction:
(b) Write quadruple, triples and indirect triples for following expression :
a = b * – c + b * – c.
Ans:

You might also like