Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Q-1 Describe Data structures for symbol table.

 Ans- Data structures for symbol table

During compilation process names get added in symbol table in which manner they
encountered in program as well as the information about that name is also added.

If new information about an existing name is discovered then that information is also
added. Thus in designing a symbol table mechanism, we would like a scheme that
allows us to add new entries and find a scheme that allows us to add new entries and
find existing entries in table efficiently.

There are three data structures used to implement symbol table:

1. Linear list

2. Binary tree

3. Hash table

1. Linear list or lists

It is a simplest and easiest to implement data structure.

We use a single array to store names and their associated information.

New names are added to the list in the order in which they are encountered.

To insert a new name, we must scan down the list to make sure that it is not already
there. If not then add it otherwise an error message i.e. Multiply declared name.

When the name is located, the associated information can be found in words
following next.

To retrieve information about a name, we search from the beginning of the array
upto the position marked by AVAILABLE pointer, which indicates the beginning
of the empty portion of array.

If we reach AVAILABLE pointer without finding NAME, we have a fault-the use


of an undefined name.
INFO1 INFO1

NAME 1

Attribute1

NAME 2
Attribute1
:
:

NAME n

Attributen

AVAILABLE (pointer)

To find data about the NAME, we shall on the average search N/2 names. So
the cost of an enquiry is proportional to N.

One advantage of list organization is that the minimum possible space is taken in
simple compiler.

ii) Trees

It is a more efficient approach to symbol table organization. Here we add two link
fields LEFT and RIGHT to each record.

Following algorithm is used to look for NAME in a binary search tree where p
is initially a pointer to the root.

1. while p≠null do

2. if NAME=NAME(p) then /* NAME found take action on success*/

3. else if NAME<NAME(p) then

p:=LEFT(p) /* visit left child*/

4. else /* NAME (p)<NAME*/

P:= RIGHT(p) /*visit right child*/


iii) Hash table

Hashing table technique is suitable for searching and hence it is implemented in compiler.

Here, basic hashing schema is shown in above figure(3). Two tables:


hash table and a storage table are used.

The hash table consists of k words numbered 0,1,2,…,k-1.These words


are pointers into the storage enable to the heads of k separate linked
lists I(some lists may be empty). Each record in the symbol table
appears on one of these lists.

To determine whether NAME is in the symbol table, we apply NAME as


a hash function h such that h(NAME) is an integer between 0 to k-1.

Q-2 Explain different types of intermediate code representations.


The translation of the source code into the object code for the target
machine, a compiler can produce a middle-level language code, which is
referred to as intermediate code or intermediate text. There are three types
of intermediate code.
Postfix Notation

In postfix notation, the operator comes after an operand, i.e., the operator
follows an operand.

Example

 Postfix Notation for the expression (a+b) * (c+d) is ab + cd +*


 Postfix Notation for the expression (a*b) - (c+d) is ab* + cd + - .

Syntax Tree

A tree in which each leaf node describes an operand & each interior node an
operator. The syntax tree is shortened form of the Parse Tree.

Example − Draw Syntax Tree for the string a + b ∗ c − d.


Three-Address Code

The three-address code is a sequence of statements of the form A−=B op C,


where A, B, C are either programmer-defined names, constants, or compiler-
generated temporary names, the op represents for an operator that can be
fixed or floating point arithmetic operators or a Boolean valued data or a
logical operator. The reason for the name “three address code” is that each
statement generally includes three addresses, two for the operands, and one
for the result.

There are three types of three address code statements which are as follows

Quadruples representation − Records with fields for the operators and


operands can be define three address statements. It is possible to use a
record structure with fields, first hold the operator ‘op’, next two hold
operands 1 and 2 respectively, and the last one holds the result. This
representation of three addresses is called a quadruple representation.

Triples representation − The contents of operand 1, operand 2, and result


fields are generally pointer to symbol records for the names described by
these fields. Therefore, it is important to introduce temporary names into
the symbol table as they are generated.
This can be prevented by using the position of statement defines temporary
values. If this is completed then, a record structure with three fields is
enough to define the three address statements− The first holds the operator
and the next two holds the values of operand 1 and operand 2 respectively.
Such representation is known as triple representation.

Indirect Triples Representation − The indirect triple representation uses


an extra array to list the pointer to the triples in the desired sequence. This
is known as indirect triple representation.

The triple representation for the statement x− = (a + b)*-c is as follows −

Statement Statement Location Operator Operand 1 Operand 2

(0) (1) (1) + a B

(1) (2) (2) - c

(2) (3) (3) * (0) (1)

(3) (4) (4) / (2) d

(4) (5) (5) :=) (3)

Q-3 Explain lexical, syntax and semantic phase errors and their
recovery in details.
Lexical phase errors
These errors are detected during the lexical analysis phase. Typical
lexical errors are:
1. Exceeding length of identifier or numeric constants.
2. The appearance of illegal characters
3. Unmatched string
Error recovery for lexical errors:
Panic Mode Recovery
 In this method, successive characters from the input are removed one
at a time until a designated set of synchronizing tokens is found.
Synchronizing tokens are delimiters such as; or }
 The advantage is that it is easy to implement and guarantees not to
go into an infinite loop
 The disadvantage is that a considerable amount of input is skipped
without checking it for additional errors
Syntactic phase errors:
These errors are detected during the syntax analysis phase. Typical syntax
errors are:
 Errors in structure
 Missing operator
 Misspelled keywords
 Unbalanced parenthesis
 Example : swich(ch)
 {
 .......
 .......
 }

The keyword switch is incorrectly written as a swich. Hence, an “Unidentified


keyword/identifier” error occurs.
Error recovery for syntactic phase error:
1. Panic Mode Recovery
 In this method, successive characters from the input are removed one at a
time until a designated set of synchronizing tokens is found. Synchronizing
tokens are deli-meters such as; or }
 The advantage is that it’s easy to implement and guarantees not to go into
an infinite loop
 The disadvantage is that a considerable amount of input is skipped without
checking it for additional errors
2. Statement Mode recovery
 In this method, when a parser encounters an error, it performs the necessary
correction on the remaining input so that the rest of the input statement
allows the parser to parse ahead.
 The correction can be deletion of extra semicolons, replacing the comma
with semicolons, or inserting a missing semicolon.
 While performing correction, utmost care should be taken for not going in
an infinite loop.
 A disadvantage is that it finds it difficult to handle situations where the
actual error occurred before pointing of detection.
3. Error production
 If a user has knowledge of common errors that can be encountered then,
these errors can be incorporated by augmenting the grammar with error
productions that generate erroneous constructs.
 If this is used then, during parsing appropriate error messages can be
generated and parsing can be continued.
4. Global Correction
 The parser examines the whole program and tries to find out the closest
match for it which is error-free.
 The closest match program has less number of insertions, deletions, and
changes of tokens to recover from erroneous input.
 Due to high time and space complexity, this method is not implemented
practically.
Semantic errors
These errors are detected during the semantic analysis phase. Typical semantic errors
are
 Incompatible type of operands
 Undeclared variables
 Not matching of actual arguments with a formal one
Example : int a[10], b;
.......
.......
a = b;
It generates a semantic error because of an incompatible type of a and b.
Error recovery for Semantic errors
 If the error “Undeclared Identifier” is encountered then, to recover from
this a symbol table entry for the corresponding identifier is made.
 If data types of two operands are incompatible then, automatic type
conversion is done by the compiler.

Q-4 Explain in detail the code optimization.


Optimizing code in compiler design is important because it directly
affects the performance of the compiled code. A well-optimized code
runs faster and consumes fewer resources, leading to improved
overall system performance and reduced energy consumption.
Additionally, optimization can reduce the size of the generated code,
which is important for embedded systems with limited memory. The
optimization process can also help identify and eliminate bottlenecks
in the code, leading to more efficient algorithms and improved
software design.
Types of Code Optimization
The code optimization process can be broadly classified into two
types :
 Machine Independent Optimization
 Machine Dependent Optimization
1. Machine Independent Optimization
This step of code optimization aims to optimize the intermediate code
to produce a better target code. No CPU registers or absolute memory
addresses are involved in the section of the intermediate code that is
translated here.
2. Machine Dependent Optimization
After the target code has been created and converted to fit the target
machine architecture, machine-dependent optimization is performed.
It may use absolute memory references rather than relative memory
accesses and requires CPU registers. Machine-dependent optimizers
make a concerted attempt to maximize the memory hierarchy's
benefits.
Code Optimization Techniques
A code has several statements, loops, branches, etc. So code
optimization must be performed on all of them. The code optimization
is done differently, considering the following.
1. Loop Optimization
The majority of programs in the system operate in a loop. It is vital to
optimize the loops to save CPU cycles and memory. The following
strategies can be used to improve loops.
 Loop-invariant code: It is a piece of code that sits in the loop
and computes the same value each time an iteration is
performed. This code may be moved out of the loop by
storing it to be calculated just once rather than with each
iteration.
 Induction analysis: If a loop-invariant value changes the
value of a variable within the loop, it is termed an induction
variable.
 Strength reduction: Some expressions use more CPU cycles,
time, and memory than others. These expressions should be
replaced with less expensive expressions without sacrificing
the expression's output. For example, multiplication (x * 2)
uses more CPU cycles than (x 1) but produces the same
output.
Figure 1 - Loop Code flow chart
2. Partially dead code
Some code statements include calculated values utilized only in
particular conditions, i.e., the values are used sometimes and not
others. Partially dead-code refers to such codes.
The control flow diagram below shows a program section in which the
variable 'a is utilized to assign the output of the equation 'x * y'. Let's
pretend that the ‘a variable's value is never utilized within the loop. 'a'
is given the variable 'z' value, which will be utilized later in the
program, immediately after the control leaves the loop. We may infer
that because the assignment code 'a' is never utilized anywhere, it is
suitable for deletion.
Figure 2 - Dead Code flow chart
Similarly, the conditional statement in the image above is always false,
meaning that the code written in the "true" case will never be run and
so may be eliminated.
3. Unreachable Code Elimination
A control flow graph should be created first. An inaccessible code
block does not have an incoming edge. The inaccessible branches can
be deleted after continual propagation and folding.
4. Function Inlining
The body of the function takes the place of a function call. This saves
a lot of time by eliminating the need to copy all parameters, store the
return address, and so on. Let us explain this with an example below:
int addtwonum (int a, int b){
return a+b;
}
int subtract(int a, int b){
return addtwonum (a, -b);
}

Here, we see that by negating one of the numbers according to the


context of the scenario, we call the addition function for subtraction.
Now, let us see the below snippet:
int subtract(int a, int b){
return a+ (-b);
}

Here, we did the work of the function addtwonum itself into the
subtract function. This is function inlining.
5. Function Cloning
For different calling arguments, specialized codes for a function are
constructed. Overloading a function is an example of this. We can
understand it with the following snippet:
void solve(int a){
….
}
void solve(int a, int b){
….
}
void solve(int a, float b, long c){
….
}
We can see that the function's name is the same(solve), but one of
them will be called according to the different parameters being
passed to it.
6. Partial Redundancy
In a parallel route, redundant expressions are calculated many times
without changing the operands. Partial-redundant expressions, on the
other hand, are calculated several times along a path without
changing the operands. By employing a code-motion approach, loop-
invariant code may be rendered largely redundant.
An example of a partially redundant code can be:
If (condition) {
a = y OP z;
} else {
...
}
c = y OP z;

We assume that the operands' values (y and z) do not change when


variable an is assigned to variable c. If the condition statement is true,
y OP z is calculated twice; otherwise, it is computed once. As stated
below, code motion may be utilized to remove redundancy:
If (condition) {
...
tmp = y OP z;
a = tmp;
...
} else {
...
tmp = y OP z;
}
c = tmp;

Q-5 Explain in detail the storage allocation strategies.


Ans :
Storage Allocation Strategies
There are mainly three types of Storage Allocation Strategies:
1. Static Allocation
2. Heap Allocation
3. Stack Allocation
1. Static Allocation
Static allocation lays out or assigns the storage for all the data objects at the compile
time. In static allocation names are bound to storage. The address of these identifiers
will be the same throughout. The memory will be allocated in a static location once it
is created at compile time. C and C++ use static allocation.
For example:
int number = 1;
static int digit = 1;

2. Heap Allocation
Heap allocation is used where the Stack allocation lacks if we want to retain the values
of the local variable after the activation record ends, which we cannot do in stack
allocation, here LIFO scheme does not work for the allocation and de-allocation of
the activation record. Heap is the most flexible storage allocation strategy we can
dynamically allocate and de-allocate local variables whenever the user wants
according to the user needs at run-time. The variables in heap allocation can be
changed according to the user’s requirement. C, C++, Python, and Java all of these
support Heap Allocation.
For example:
int* ans = new int[5];

3. Stack Allocation
Stack is commonly known as Dynamic allocation. Dynamic allocation means the
allocation of memory at run-time. Stack is a data structure that follows the LIFO
principle so whenever there is multiple activation record created it will be pushed or
popped in the stack as activations begin and ends. Local variables are bound to new
storage each time whenever the activation record begins because the storage is
allocated at runtime every time a procedure or function call is made. When the
activation record gets popped out, the local variable values get erased because the
storage allocated for the activation record is removed. C and C++ both have support
for Stack allocation.
For example:
void sum(int a, int b){int ans = a+b;cout<<ans;}
// when we call the sum function in the example above,
memory will be allotted for the variable ans

You might also like