Implementation of Three Address Code

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Implementation of Three Address Code –

There are 3 representations of three address code namely

Quadruple,Triples,Indirect Triples

1. Quadruple – It is a structure which consists of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the
result of the expression.

Advantage – Easy to rearrange code for global optimization.

One can quickly access value of temporary variables using symbol table.

Disadvantage – Contain lot of temporaries.

Temporary variable creation increases time and space complexity.

Example – Consider expression a = b * – c + b * – c. The three address code is:

t1 = uminus c

t2 = b * t1

t3 = uminus c

t4 = b * t3

t5 = t2 + t4

a = t5

2. Triples – This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that triple is
used. So, it consist of only three fields namely op, arg1 and arg2.

Disadvantage – Temporaries are implicit and difficult to rearrange code.

It is difficult to optimize because optimization involves moving intermediate code. When a triple is
moved, any other triple referring to it must be updated also. With help of pointer one can directly
access symbol table entry.

Example – Consider expression a = b * – c + b * – c

3. Indirect Triples – This representation makes use of pointer to the listing of all references to
computations which is made separately and stored. Its similar in utility as compared to quadruple
representation but requires less space than it. Temporaries are implicit and easier to rearrange code.

Example – Consider expression a = b * – c + b * – c


Register allocation:

Register allocation and assignment are crucial aspects of the code generation phase in compiler
construction. The goal is to efficiently utilize the limited number of hardware registers while
translating high-level programming language code into machine code.

Top-Down Register Allocation: Allocates registers based on a top-down traversal of the program's
abstract syntax tree. Variables are assigned registers according to their positions in the tree.

Spilling: When there are more live variables than available registers, some variables must be
"spilled" to memory. The spilled variables are then loaded and stored explicitly before and after each
use.

Register Renaming: Introduces new names for variables to create additional virtual registers. This
can help in reducing the pressure on physical registers.

Copy Propagation: Replaces uses of a variable with the value it holds at a particular point, reducing
the need for multiple registers to store the same value.

Software Pipelining: Reorganizes loops to make better use of available registers and pipeline stages.

Language processing system:

Components of Language processing system:

Preprocessor: It includes all header files and also evaluates whether a macro(A macro is a piece of
code that is given a name. Whenever the name is used, it is replaced by the contents of the macro
by an interpreter or compiler. The purpose of macros is either to automate the frequency used for
sequences or to enable more powerful abstraction) is included. It takes source code as input and
produces modified source code as output. The preprocessor is also known as a macro evaluator, the
processing is optional that is if any language that does not support #include macros processing is not
required.

Compiler: The compiler takes the modified code as input and produces the target code as output.

Assembler: The assembler takes the target code as input and produces real locatable machine code
as output.

Linker: Linker or link editor is a program that takes a collection of objects (created by assemblers and
compilers) and combines them into an executable program.

Loader: The loader keeps the linked program in the main memory.

Executable code: It is low-level and machine-specific code that the machine can easily understand.
Once the job of the linker and loader is done the object code is finally converted it into executable
code. Differences between Linker/Loader: The differences between linker and loader are as follows.

Functions of loader: Allocation: It is used to allocate space for memory in an object program. A
translator cannot allocate space because there may be overlap or large waste of memory. Linking: It
combines two or more different object programs and resolves the symbolic context between object
decks. It also provides the necessary information to allow reference between them. Linking is of two
types as follows.

Static Linking: It copies all the library routines used in the program into an executable image. This
requires more disk space and memory.

Dynamic Linking: It resolves undefined symbols while a program is running. This means that
executable code still has undefined symbols and a list of objects or libraries that will provide
definitions for the same.

Reallocation: This object modifies the program so that it can be loaded to an address different from
the originally specified location, and to accommodate all addresses between dependent locations.

Loading: Physically, it keeps machine instructions and data in memory for execution.

Lexical analysis:

Lexical analysis is the first phase of the compilation process, and it plays a crucial role in the overall
functioning of a compiler. Its primary task is to analyze the source code and break it down into a
sequence of tokens. Tokens are the smallest meaningful units of a programming language, such as
keywords, identifiers, operators, and literals. Lexical analysis involves scanning the source code,
identifying these tokens, and generating a stream of tokens for the subsequent phases of the
compiler. The role of lexical analysis in a compiler includes:

Tokenization: Breaking down the source code into a stream of tokens. Removing Whitespace and
Comments: Discarding unnecessary elements like spaces, tabs, and comments, which do not
contribute to the meaning of the program. Identifying Keywords and Symbols: Recognizing reserved
words, operators, and other language-specific symbols. Building Symbol Tables: Keeping track of
identifiers and their attributes for further analysis. Lexical errors refer to mistakes or inconsistencies
in the lexical structure of the source code. These errors may include: Spelling Mistakes: Incorrect
spellings of keywords or identifiers. Illegal Characters: Using characters not allowed by the
programming language.

Improper Token Usage: Misuse of operators or other symbols.

For lexical error recovery, various strategies can be employed:

Panic Mode: When a lexical error is detected, the compiler enters a panic mode, discards tokens
until a synchronization point (such as the end of a statement or block) is reached, and then resumes
analysis. This approach helps in preventing cascading errors caused by a single mistake.

Global Correction: Attempting to correct errors globally rather than locally. This may involve fixing
typos or suggesting corrections based on the context to minimize the impact of errors. Insertion and
Deletion: Automatically inserting or deleting characters to correct the error. This approach can be
challenging as it requires understanding the intended meaning of the code.

Token-Level Recovery: Identifying the beginning of a new token and resuming analysis from there,
effectively isolating the error and continuing with the rest of the code.
Q.how input is scanned in lexical analysis ? explain the adavantage of two buffer input scheme over
one buffer input scheme for scanning source program?

In lexical analysis, the process of scanning input involves reading the source code character by
character and identifying the sequence of characters that form tokens. The input is typically scanned
using one of two buffer input schemes: one buffer input scheme or two buffer input scheme.

One Buffer Input Scheme: In a one-buffer input scheme, the source code is read into a single buffer.

The lexical analyzer scans the characters from this buffer to identify tokens.

If the buffer becomes empty, the lexical analyzer refills it by reading more characters from the
source file.Advantages:Simplicity: The one-buffer scheme is straightforward to implement, and it is
easier to manage the input processing logic. Disadvantages:Limited Lookahead: The one-buffer
scheme has a limited lookahead capability because it only considers the current buffer. This
limitation can make it challenging to handle certain language constructs that require a broader
context for accurate tokenization.

Two Buffer Input Scheme: In a two-buffer input scheme, the source code is read into two alternating
buffers.

While one buffer is being processed by the lexical analyzer, the other buffer is being filled with
characters from the source file.

This allows for a lookahead capability, as the lexical analyzer can examine the contents of the next
buffer without waiting for a refill from the source file.

Advantages: Improved Lookahead: The two-buffer scheme provides better lookahead capabilities.
The lexical analyzer can anticipate upcoming tokens by examining the contents of the second buffer.
This is particularly useful for handling constructs that span across buffer boundaries. Reduced I/O
Wait Time: Since one buffer is being filled while the other is being processed, there is less idle time
waiting for I/O operations. This can lead to more efficient scanning and tokenization.

Enhanced Error Detection: The two-buffer scheme can contribute to better error detection and
recovery, as the lexical analyzer has a broader context to identify potential errors.

Disadvantages: Increased Complexity: Implementing a two-buffer scheme is more complex than a


one-buffer scheme, as it involves managing two buffers and coordinating their use.

Sdd Simple desk calculator:

A Syntax-Directed Definition (SDD) is a formal way to specify the translation of a programming


language's syntax into the corresponding semantics. It associates semantic actions with the
production rules of the grammar. These semantic actions can include computations, assignments, or
any other actions that define the behavior of the language constructs. An SDD is typically associated
with a context-free grammar and a set of semantic rules. Each production in the grammar is
augmented with these semantic rules, which specify the actions to be performed during the parsing
process. Now, let's consider a simple desk calculator with the following grammar and syntax-
directed definition: E -> E + T | T. T -> T * F | F. F -> num
Syntax-Directed Definition (SDD) with semantic rules:

For production E -> E + T: E.val = E1.val + T.val

For production E -> T: E.val = T.val

For production T -> T * F: T.val = T1.val * F.val

For production T -> F: T.val = F.val

For production F -> num: F.val = num.val

Now, let's use this SDD to create an annotated parse tree for the expression 3 * 5 + 4:

Graph available in long cheat.

Activation Record: An activation record, also known as a stack frame or function call frame, is a data
structure used by a compiler to manage information about a specific function or procedure call
during program execution. Activation records are typically stored on the call stack, and each function
invocation has its own activation record. They play a crucial role in supporting function calls,
parameter passing, local variable storage, and maintaining the program's execution context.

Return Address: Purpose: This field stores the address to which control should return after the
function completes its execution. It allows the program to resume its execution at the point
immediately following the function call.

Previous Frame Pointer: Purpose: Points to the activation record of the calling function, establishing
a dynamic link to the previous function's activation record on the call stack.

Static Link: Purpose: In languages that support nested scopes (nested functions or blocks with local
variables), the static link points to the activation record of the lexically enclosing scope. It aids in
accessing non-local variables and supports static (lexical) scoping.

Dynamic Link: Purpose: Similar to the static link, the dynamic link helps in accessing non-local
variables. In languages with dynamic scoping, it points to the activation record of the dynamically
enclosing scope.

Parameters: Purpose: Stores the values of parameters passed to the function. This portion of the
activation record facilitates communication between the calling function and the called function.

Local Variables: Purpose: Provides storage for local variables declared within the function. The
compiler allocates space for these variables to store intermediate results and other data relevant to
the function.

Temporary Data: Purpose: Allocates space for temporary data used during the function's execution.
This can include intermediate results, temporary calculations, and other data that is not part of the
function's formal parameter list or local variables.

Control Link: : Holds additional control information, such as information needed for exception
handling.
Loop invariant:

A loop invariant is a condition that holds true for every iteration of a loop. It is a statement or
property that remains unchanged during the entire execution of the loop. Loop invariants are
essential for reasoning about the correctness of loops and are often used in loop analysis and
verification. A loop invariant typically has three key properties: Initialization: The invariant holds true
before the loop begins its execution (at the loop's initialization). Maintenance: If the invariant is true
before an iteration of the loop, it remains true after the execution of that iteration. Termination:
When the loop terminates, the loop invariant guarantees that a desired property or condition is
satisfied. Give a example of java programme and derive the loop.

3 address code types:

Three-address code (3AC) is an intermediate representation of code that simplifies the complex
syntax of high-level programming languages into a set of simple and linear instructions. Each
instruction in three-address code typically contains at most three operands or addresses.

A generic three-address code instruction has the following form:

X = Y op Z

Here, X, Y, and Z are operands, and op is an operator. The instruction represents a simple operation
where the result of the operation (Y op Z) is stored in the variable X.

There are several types of three-address code instructions, each corresponding to different
operations and expressions. Some common types include:

Assignment:

X=Y

This instruction assigns the value of Y to the variable X.

Binary Operations:

X = Y op Z

This type of instruction performs binary operations like addition, subtraction, multiplication, division,
etc.

Unary Operations:

X = op Y

This type of instruction represents unary operations, such as negation or logical NOT.

Conditional Jumps:
if X goto L

This instruction is used for conditional jumps. It checks the value of X and jumps to the specified
label L if the condition is true.

Unconditional Jumps:

goto L

This instruction represents an unconditional jump to the specified label L.

Label:

L:

Labels are used to mark specific locations in the code, often serving as targets for jump instructions.

Here's a simple example of three-address code for the expression a = b + c * d:

t1 = c * d

t2 = b + t1

a = t2

In this example, t1, t2, and a are temporary variables used to hold intermediate values during the
evaluation of the expression.

Three-address code is a convenient and compact representation that simplifies the analysis and
optimization of code during the compilation process. It serves as an intermediate step between the
high-level source code and the machine code generated by the compiler.

Code optimization:

Code optimization is a crucial phase in the compilation process, and its primary goal is to improve
the performance of the generated code. Optimization aims to produce more efficient, faster, and
smaller executable code without changing the program's functionality.

Various code optimization techniques are employed by compilers to achieve these goals. Some
common optimization techniques include:

Constant Folding: Evaluate constant expressions at compile-time rather than runtime.

Dead Code Elimination: Remove code that has no effect on the program's output.

Common Subexpression Elimination: Identify and eliminate redundant computations by reusing


already computed results.

Loop Optimization: Loop Unrolling: Duplicate loop bodies to reduce loop control overhead.

Loop Fusion: Combine multiple nested loops into a single loop for better performance.
Loop-Invariant Code Motion: Move loop-invariant code outside the loop to reduce redundant
computations.

Inlining: Replace function calls with the actual body of the function to eliminate the overhead of the
function call.

Register Allocation: Optimize the usage of CPU registers to minimize memory accesses.

Data Flow Analysis: Analyze the flow of data through the program to identify optimization
opportunities.

Instruction Scheduling: Reorder instructions to improve pipeline utilization and reduce stalls.

Parallelization: Identify and exploit opportunities for parallel execution, including loop parallelization
and instruction-level parallelism.

Code Reordering: Change the order of code to improve locality and reduce branch mispredictions.

Peephole:

Code Simplification: The primary purpose of peephole optimization is to simplify and improve the
efficiency of the generated code. This may involve replacing a sequence of instructions with a more
concise version or eliminating unnecessary operations. Examples of peephole optimizations:
Constant Folding: Replace expressions involving constants with their precomputed values.
Redundant Load/Store Elimination: Eliminate unnecessary load or store operations. Dead Code
Elimination: Remove instructions that have no effect on the program's output. Strength Reduction:
Replace expensive operations with cheaper alternatives, such as replacing multiplication with a
sequence of shifts and additions. Branch Optimization: Simplify conditional branches or eliminate
unnecessary jumps. Instruction Reordering: Reorder instructions for better pipelining or improved
cache locality. While peephole optimization is limited in its scope, it can contribute to overall code
quality by addressing inefficiencies at a fine-grained level. It is often complemented by other
optimization techniques at higher levels of abstraction during the compilation process.

Type expression:

Type expressions are used in intermediate code generation to represent the types of variables,
expressions, and functions. A type expression is a symbolic representation of a type, such as int,
float, or char. It is used by the compiler to determine the size and layout of data structures, and to
perform type checking.

Type Expressions. • Arrays are specified as array(I,T), where T is a type and I is an integer or a.
range of integers. For example, C declaration “int a[100]” identifies type of. a to be array(100, integer)

Error handler:

The tasks of the Error Handling process are to detect each error, report it to the user, and then make some
recovery strategy and implement them to handle the error. During this whole process processing time of the
program should not be slow. Functions of Error Handler:

Error Detection, Error Report

Error Recovery

You might also like