Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

NARAYANA ENGINEERING COLLEGE::NELLORE

Permanently affiliated to JNTUA Ananthapuramu, Approved by AICTE,


Accorded ‘A’ grade by Govt. of AP, Recognized by UGC 2(f) & 12(B),
ISO 9001:2015 certified Institution, Approved with ‘A+’ Grade by NAAC.
Department of Computer Science and Engineering
-------------------------------------------------------------------------------------------------------------------------------

CONCEPTS DEFINITIONS & FORMULAE (CDF)

Course Details
Class: IIIrd B.Tech IInd Semester Branch: CSE Year: 2019-20
Course Title : Compiler Design Course Code: 15A05601 Credits: 3
Program/Dept.: Computer Science & Engineering (CSE) Batch: 2017-21
Regulation: R-15 Faculty: C. Rama Mohan

Unit - I
Introduction: Language processors, The Structure of a Compiler, the science of building a complier.
Lexical Analysis: The Role of the lexical analyzer, Input buffering, Specification of tokens, Recognition
of tokens, The lexical analyzer generator Lex, Design of a Lexical Analyzer generator.

Introduction
• In order to reduce the complexity of designing and building computers, nearly all of these are made
to execute relatively simple commands (but do so very quickly).
• A program for a computer must be built by combining these very simple commands into a program in
what is called machine language.
• Since this is a tedious and error prone process most programming is, instead, done using a high-
level programming language.
• This language (HLL) can be very different from the machine language that the computer can
execute, so some means of bridging the gap is required. This is where the compiler comes in.

• A COMPILER translates (or compiles) a program written in a high-level programming language that
is suitable for human programmers into the low-level machine language that is required by computers.
During this process, the compiler will also attempt to spot and report obvious programmer mistakes.

• Using a high-level language for programming


has a large impact on how fast programs can
be developed. The main reasons for this are:

1. Compared to machine language, the notation


used by programming languages is closer to the
way humans think about problems.
2. The compiler can spot some obvious
programming mistakes. Programs written in a high-level language tend to be shorter than equivalent
programs written in machine language.
3. Another advantage of using a high-level level language is that the same program can be compiled to
many different machine languages and, hence, be brought to run on many different machines.
4. On the other hand, programs that are written in a high-level language and automatically translated to
machine language may run somewhat slower than programs that are hand-coded in machine language.
5. Hence, some time-critical programs are still written partly in machine language.
6. A good compiler will, however, be able to get very close to the speed of hand-written machine code
when translating well structured programs.

Language processors
Preprocessor:
• A preprocessor produce input to compilers.
They may perform the following functions.
1.Macro processing: A preprocessor may allow a
user to define macros that are short hands for
longer constructs.
2.File inclusion: A preprocessor may include
header files into the program text.
3.Rational preprocessor: these preprocessors
augment older languages with more modern
flow-of-control and data structuring facilities.
4.Language Extensions: These preprocessor
attempts to add capabilities to the language by
certain amounts to build-in macro

Compiler:
• Compiler is a translator program that
translates a program written in (HLL) the
source program and translate it into an
equivalent program in (MLL) the target
program.
Assembler:
• An assembler translates assembly language programs into machine code. The input to an assembler
program is called source program, the output is a machine language translation.
Interpreter:
• Interpreter is also a language translator like a compiler.
• Interpreter directly executes the operations in the source program on inputs provided by user rather than
producing a target program as a translation.
• Interpreter is a common type of language processor.
• It executes the source program statement by statement and therefore it provides better error diagnostics
in comparison with compiler.

Loader and Linker:


• A LINKER collect code separately compiled or assembled in different object files into a file that is
directly executable.
• Linker resolves external memory addresses, where the code in one file may refer to a location in
another file.
• A LOADER is one which executable object files into memory for execution.

The Structure of a Compiler

Phases of a compiler:
• A compiler operates in phases. A phase is a logically interrelated operation that takes source program
in one representation and produces output in another representation.
• The phases of a compiler are shown in below. There are two phases of compilation.
a. Analysis (Machine Independent/Language Dependent)
b. Synthesis(Machine Dependent/Language independent)

• Compilation process is partitioned into no-of-sub processes called ‘phases’.

Lexical Analysis:

• Lexical Analysis or Scanners or Linear Analysis reads the source program one character at a time,
carving the source program into a sequence of automic units called tokens. The input of the lexical
analysis is source program and output is set of tokens or stream of tokens.
• Functions of Lexical analyzer are
a) Removing white space
b) Removing constants, identifiers and keywords
c) Removing comments

Syntax Analysis:
• The second stage of translation is called Syntax analysis or Parsing. In this phase expressions,
statements, declarations etc… are identified by using the results of lexical analysis. Syntax analysis is
aided by using techniques based on formal grammar of the programming language.
Semantic Analysis:
• Semantic analysis checks whether the parse tree constructed follows the rules of language
Intermediate Code Generations:
• An intermediate representation of the final machine language code is produced. This phase bridges the
analysis and synthesis phases of translation.
Code Optimization:
• This is optional phase described to improve the intermediate code so that the output runs faster
and takes less space.

Code Generation:
• The last phase of translation is code generation. A number of optimizations to reduce the length of
machine language program are carried out during this phase. The output of the code generator is
the machine language program of the specified computer.

Translator:
• It is a program that takes as input a program written in one language (source language) and produces as
output a program in another language (object language). Types of translators are compiler, interpreter,
and assembler.

The science of building a complier

Modeling in Compiler Design and Implementation


• The study of compiler is mainly a study of how we design the right mathematical models and choose
the right algorithms, while balancing the need for generality and power against simplicity and
efficiency.
The science of code optimization
• The term "optimization" in compiler design refers to the attempts that a compiler makes to produce
code that is more efficient than the obvious code.

• Compiler optimization must meet the following design objectives:


1. The optimization must be correct, that is, preserve the meaning of the compiled program.
2. The optimization must improve the performance of many programs.
3. The compilation time must be kept reasonable, and
4. The engineering effort required must be manageable.

Lexical Analysis
Introduction
• To identify the tokens we need some method of describing the possible tokens that can appear in the
input stream. For this purpose we introduce regular expression, a notation that can be used to describe
essentially all the tokens of programming language.
• Secondly , having decided what the tokens are, we need some mechanism to recognize these in the
input stream. This is done by the token recognizers, which are designed using transition diagrams and
finite automata.

The Role of the lexical analyzer

• This phase scans the source code as a stream of characters and converts it into meaningful lexemes.

Concept of Input buffering

• The LA scans the characters of the source program one at a time to discover tokens. Because of large
amount of time can be consumed scanning characters, specialized buffering techniques have been
developed to reduce the amount of overhead required to process an input character.
• Buffering techniques: 1. Buffer pairs 2. Sentinels.
Token, Lexeme, Pattern

Token:
• Token is a sequence of characters that can be treated as a single logical entity. Typical tokens are,
1) Identifiers 2) keywords 3) Operators 4) Special symbols 5) Constants
Pattern:
• A set of strings in the input for which the same token is produced as output. This set of strings is
described by a rule called a pattern associated with the token.
Lexeme:
• A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.

Specification of tokens: There are 3 specifications of tokens:


1) Strings (Finite sequence of symbols)
2) Language
3) Regular expression

Recognition of tokens

• The lexical analyzer will recognize the keywords if, then, else, as well as the lexemes denoted by relop,
id, and num. To simplify matters, we assume keywords are reserved; that is, they cannot be used as
identifiers.
• To recognize the tokens in the input stream transition diagram and finite automata are convenient
ways of designing recognizers.

The lexical analyzer generator LEX

• Design of a LA Generator Two approaches: NFA-based DFA-based The Lex compiler is implemented
using the second approach.
Lex specifications:
• A Lex program (the .l file ) consists of three parts:
declarations
%%
translation rules
%%
auxiliary procedures

• The declarations section includes declarations of variables, manifest constants(A manifest constant is
an identifier that is declared to represent a constant e.g. # define PIE 3.14), and regular definitions.
• The translation rules of a Lex program are statements of the form :
p1 {action 1}
p2 {action 2}
p3 {action 3}
……
……
where each p is a regular expression and each action is a program fragment describing what action the
lexical analyzer should take when a pattern p matches a lexeme. In Lex the actions are written in C.
• The third section holds whatever auxiliary procedures are needed by the actions. Alternatively these
procedures can be compiled separately and loaded with the lexical analyzer.

Definition of Lex: Lex is a program designed to generate scanners, also known as tokenizes, which


recognize lexical patterns in text.
Unit - II
Syntax Analysis: Introduction, Context Free Grammars, Writing a grammar, TOP Down
Parsing, Bottom Up Parsing, Introduction to LR Parsing: Simple LR, More Powerful LR Parsers,
Using ambiguous grammars, Parser Generators

Syntax Analysis
Concept: Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the
syntactical structure of the given input, i.e. whether the given input is in the correct syntax or not.
.
Definition: Syntax analysis is the process of analyzing a string of symbols, either in natural
language, computer languages or data structures, conforming to the rules of a formal grammar.

Context Free Grammars (CFG)


Concept: In formal language theory, a context-free grammar (CFG) is a certain type of formal
grammar: a set of production rules that describe all possible strings in a given formal language.
Production rules are simple replacements. 

Definition: A CFG is a set of recursive rewriting rules (or productions) used to generate patterns
of strings.

A CFG consists of the following components:

 A set of terminal symbols, which are the characters of the alphabet that appear in the
strings generated by the grammar.
 A set of non-terminal symbols, which are placeholders for patterns of terminal symbols
that can be generated by the non-terminal symbols.
 A set of productions, which are rules for replacing (or rewriting) non-terminal symbols
(on the left side of the production) in a string with other non-terminal or terminal symbols
(on the right side of the production).
 A start symbol, which is a special non-terminal symbol that appears in the initial string
generated by the grammar.

Ambiguous Grammar
Concept: While deriving a string from the given grammar we can find an Ambiguous grammar

Definition: A grammar that produces more than one parse tree for some sentence is said to be
ambiguous.
(or)
An ambiguous grammar is one that produces more than one leftmost or rightmost derivation for
the same sentence. Ex: E E+E / E*E / id

Left Recursive Grammar


Definition: A grammar is a left recursive if it has a non-terminal A such that there is a derivation
A⇒Aα for some string α.
Left Factoring
Definition: Left factoring is a grammar transformation that is useful for producing a grammar
suitable for predictive parsing. The basic idea is that when it is not clear which of two alternative
productions to use to expand a non-terminal “A”, we may be able to rewrite the “A” productions
to refer the decision until we have seen enough of the input to make the right choice

Left Most Derivation (LMD)


Definition: Derivations in which only the leftmost non-terminal in any sentential form is
replaced at each step are termed leftmost derivations

Right Most Derivation (RMD) or canonical derivation


Definition: Derivations in which the rightmost non-terminal is replaced at each step are termed
canonical derivations.

Parsing
Concept: This is a second phase of compiler which comes after lexical analyzer.

Definition: Parsing is the process of determining if a string of tokens can be generated by a


grammar.

Parse tree
Concept: A kind of tree structure generated while deriving string from the grammar.

Definition: A parse tree may be viewed as a graphical representation for a derivation that filters
out the choice regarding replacement order. Each interior node of a parse tree is labeled by some
non-terminal A and that the children of the node are labeled from left to right by symbols in the
right side of the production by which this A was replaced in the derivation. The leaves of the
parse tree are terminal symbols.

Top Down parsing


Concept: Starting with the root, labeled, does the top-down construction of a parse tree with the
starting non-terminal, repeatedly performing some steps.

Definition: top-down parsing is a parsing strategy where one first looks at the highest level of
the parse tree and works down the parse tree by using the rewriting rules of a formal
grammar. LL parsers are a type of parser that uses a top-down parsing strategy.

Recursive Descent Parsing (RDP)


Concept: Recursive Descent Parsing is top down method of syntax analysis.

Definition: In RDP we execute a set of recursive procedures to process the input. A procedure is
associated with each non- terminal of a grammar.
Predictive parsing
Concept: A special form of Recursive Descent parsing, in which the look-ahead symbol
unambiguously determines the procedure selected for each non-terminal, where no backtracking
is required.

Definition: Predictive parsing is a Top down parsing method use to parse the given string.

Bottom up Parsing
Definition: Parsing method in which construction starts at the leaves and proceeds towards the
root is called as Bottom Up Parsing.

Shift-Reduce parsing
Definition: A general style of bottom-up syntax analysis, which attempts to construct a parse
tree for an input string beginning at the leaves and working up towards the root.

Operator grammar
Definition: A grammar is operator grammar if,
1. No production rule involves “ε” on the right side.
2. No production has two adjacent non-terminals on the right side.

Example: E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id

LR (k) parsing
Concept: The “L” is for left-to-right scanning of the input, the “R” for constructing a rightmost
derivation in reverse, and the k for the number of input symbols of look ahead that are used in
making parsing decisions.

Definition: It is a top down parsing method used to parse the input string by using canonical
derivation. The class of grammars that can be parsed using LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers.

GOTO function
Concept: It a function used to construct parse LR parse table. The function goto takes a state and
grammar symbol as arguments and produces a state.

Definition: The goto function of a parsing table constructed from a grammar G is the transition
function of a DFA that recognizes the viable prefixes of G.
Ex: goto(I,X)
Where I is a set of items and X is a grammar symbol to be the closure of the set of all items
[A→αX.β] such that [A→α.Xβ] is in I

LR grammar
Definition: A grammar for which we can construct a parsing table is said to be an LR grammar.

Kernel items
Concept: These are special productions in which dot present at right side of a production
Definition: The set of items which include the initial item and all items whose dots are not at the
left end are known as kernel items.

Non kernel items


Concept: These are special productions in which dot present at right side of a production

Definition: The set of items, which have their dots at the left end, are known as non kernel items.

Parser Generator
Concept: For generating parse tree for the given input programs fragment it is being used.

Definition: It is a tool which is used to generate parse tree.


Example: YACC (Yet Another Compiler Compiler)
UNIT - III
Syntax Directed Translation: Syntax Directed Definitions, Evaluation orders for SDD’s,
Application of SDT, SDT schemes, Implementing L-attribute SDD’s.
Intermediate Code Generation: Variants of syntax trees, three address code, Types and
declarations, Translations of expressions, Type checking, control flow statements, back patching,
switch statements, intermediate code for procedure.

Syntax Directed Translation

• The Principle of Syntax Directed Translation states that the meaning of an input sentence is
related to its syntactic structure, i.e., to its Parse-Tree.
• By Syntax Directed Translations we indicate those formalisms for specifying translations
for programming language constructs guided by context-free grammars.
1. We associate Attributes to the grammar symbols representing the language constructs.
2. Values for attributes are computed by Semantic Rules associated with grammar
productions.
• Evaluation of Semantic Rules may:
1. Generate Code;
2. Insert information into the Symbol Table;
3. Perform Semantic Check;
4. Issue error messages;

• There are two notations for attaching semantic rules:


1. Syntax Directed Definitions. High-level specification hiding many implementation
details (also called Attribute Grammars).
2. Translation Schemes. More implementation oriented: Indicate the order in which
semantic rules are to be evaluated.

• Syntax Directed Definitions are a generalization of context-free grammars in which:


1. Grammar symbols have an associated set of Attributes
2. Productions are associated with Semantic Rules for computing the values of
attributes.

• Such formalism generates Annotated Parse-Trees where each node of the tree is a record
with a field for each attribute (e.g., X.a indicates the attribute a of the grammar symbol X).
• The value of an attribute of a grammar symbol at a given parse-tree node is defined by a
semantic rule
associated with the production used at that node.

• We distinguish between two kinds of attributes:


1. Synthesized Attribute: They are computed from the values of the attributes of the
children nodes.
2. Inherited Attributes: They are computed from the values of the attributes of
both the siblings and the parent nodes
S-Attributed Definitions
Definition: An S-Attributed Definition is a Syntax Directed Definition, that uses only
synthesized attributes.

Evaluation Order: Semantic rules in a S-Attributed Definition can be evaluated by a bottom-


up, or Post Order, traversal of the parse-tree.

Example: The above arithmetic grammar is an example of an S Attributed Definition. The


annotated parse-tree for the input 3*5+4n is:

L-attributed definition

Definition: A SDD its L-attributed if each inherited attribute of Xi in the RHS of A ! X1 : :Xn
depends only on
1. attributes of X1;X2; : : : ;Xi1 (symbols to the left of Xi in the RHS)
2. inherited attributes of A.

Restrictions for translation schemes

1. Inherited attribute of Xi must be computed by an action before Xi.


2. An action must not refer to synthesized attribute of any symbol to the right of that action.
3. Synthesized attribute for A can only be computed after all attributes it references have been
completed

S-attributed grammars
Definition: These are a class of attribute grammars characterized by having no inherited
attributes, but only synthesized attributes. Inherited attributes, which must be passed down from
parent nodes to children nodes of the abstract syntax tree  Attribute evaluation in S-attributed
grammars can be incorporated conveniently in both top-down parsing and bottom-up parsing.
L-attributed grammars
Definition: These are a special type of attribute grammars. They allow the attributes to be
evaluated in one depth-first left-to-right traversal of the abstract syntax tree. As a result, attribute
evaluation in L-attributed grammars can be incorporated conveniently in top-down parsing.
Dependency Graphs
Concept: Dependency graphs are a useful tool for determining an evaluation order for the
attribute instances in a given parse tree.
Definition: A dependency graph depicts the flow of information among the attribute instances in
a particular parse tree; an edge from one attribute instance to another means that the value of the
first is needed to compute the second. Edges express constraints implied by the semantic rules.

Annotated Parse Tree


Concept: Annotated parse tree shows the values of attributes, a dependency graph helps us
determine how those values can be computed.

Definition: It is a parse tree showing the values of the attributes at each node. The process of
computing the attribute values at the nodes is called annotating or decorating the parse tree.
Syntax directed translation scheme
Concept: The syntax directed translation scheme is used to evaluate the order of semantic rules.

Definition: The Syntax directed translation scheme is a context -free grammar. In translation
scheme, the semantic rules are embedded within the right side of the productions. The position at
which an action is to be executed is shown by enclosed between braces. It is written within the
right side of the production.

Intermediate code
Concept: It is one of the forms of a source programming language used by a compiler.
Definition: It is one of the machine independent codes represented by using three address fields
Generating three-address code
Concept: The three-address code is generated using semantic rules that are similar to those for
constructing syntax trees for generating postfix notation.

Syntax Tree
Concept: A syntax tree depicts the natural hierarchical structure of a source program.
Definition: it is a condensed form of parse tree

Postfix notation
Definition:
A Postfix notation is a liberalized representation of a syntax tree. It is a list of nodes of the tree
in which a node appears immediately after its children.

Three address code


Concept: The reason for the term “Three address code” is that each usually contains three
addresses, two for operands and one for the result.

Definition: It is one of the forms of a intermediate code used to represent source program
fragment using maximum of three address fields

• Three-address code is a sequence of statements of the general form


x := y op z
where x, y and z are names, constants, or compiler-generated temporaries; op stands
for any operator, such as fixed or floating-point arithmetic operator, or a logical
operator on boolean-valued data.
• Three-address code is a liberalized representation of a syntax tree or a dag in which explicit
names correspond to the interior nodes of the graph.

Quadruple
Definition: A quadruple is a record structure with four fields, which we call op, arg1, arg2 and
result.

Abstract or syntax tree


Definition: A tree in which each leaf represents an operand and each interior node an operator is
called as abstract or syntax tree.

Constructing Three address code for the following

position := initial + rate * 60


Answer:
temp1 :=inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
Triples
Concept: The fields arg1,and arg2 for the arguments of op, are either pointers to the symbol table
or pointers into the triple structure then the three fields used in the intermediate code format are
called triples. In other words the intermediate code format is known as triples.

Types of three address statements


The types of three address statements are
a. Assignment statements
b. Assignment Instructions
c. Copy statements
d. Unconditional Jumps
e. Conditional jumps
f. Indexed assignments
g. Address and pointer assignments
h. Procedure calls and return

.
Boolean Expression
Definition: Expressions which are composed of the Boolean operators (and, or, and not) applied
to elements that are Boolean variables or relational expressions are known as Boolean
expressions

Viable prefixes
Definition: Viable prefixes are the set of prefixes of right sentinels forms that can appear on the
stack of shift/reduce parser are called viable prefixes. It is always possible to add terminal
symbols to the end of the viable prefix to obtain a right sentential form.

Shot-Circuit or jumping code


Definition: We can also translate a Boolean expression into three-address code without
generating code for any of the Boolean operators and without having the code necessarily
evaluate the entire expression. This style of evaluation is sometimes called “short-circuit” or
“jumping” code.

Calling sequence
Definition: A sequence of actions taken on entry to and exit from each procedure is known as
calling sequence.

Back patching
Definition: Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process.
UNIT - IV
Run Time Environment: storage organization, Stack allocation of space, Access to non-local
data on stack , Heap management
Symbol Table: Introduction, symbol table entries, operations on the symbol table, symbol
table organizations, non block structured language, block structured language.

Concept of Run Time Environment: Runtime environment is a state of the target machine,
which may include software libraries, environment variables, etc., to provide services to the
processes running in the system.

Definition of Runtime storage: It holds the generated target code, Data objects, a counter part
of the control stack to keep track of procedure activations

Concept of storage organization: Runtime environment manages runtime memory


requirements for the following entities: Code, Procedures, Variables.

Code: This area is used to place the executable target code, as the size of the generated code is
fixed at compile time

Concept of Standard storage allocation strategies: Storage allocation strategies are Static
allocation, Stack allocation, Heap allocations.

Definition of Static allocation: It lays out storage for all data objects at compile time.

Definition of Stack allocation: It manages the runtime storage as a stack

Definition of Heap allocation: It allocates and deallocates storage as needed at runtime from a
data area.

Definition of An activation tree: It depicts the way control enters and leaves activations. It is
used to efficiently describe the nesting of procedure calls to make the stack allocation feasible.
An activation tree is used to represent the activations of procedures during the execution of the
entire program.

In the tree, NODE - represents the activation of a procedure. ROOT - represents activation of
the "main" procedure. CHILDREN NODES - represents activations of procedures called by the
parent procedure.

Calling Sequence is a code that allocates an AR on the stack and enters information into its
fields. Return Sequence is a code used to restore the state of the machine so the calling
procedure can continue its execution after the call.

Concept of Control stack: It keeps track of live procedure activations. Push the node for
activation onto the control stack as the activation begins and to pop the node when the activation
ends.

Concept of Scope of declaration: The portion of the program to which a declaration applies is
called the scope of that declaration. An occurrence of a name in a procedure is said to be local to
the procedure if it is in the scope of declaration within the procedure; otherwise the occurrence is
said to be nonlocal.

Definition of Activation: record-Information needed by a single execution of a procedure is


managed using a contiguous block of storage called an activation record. It is also called a
frame.

Definition of Access link: It refers to nonlocal data held in other activation records.

Definition of Dangling reference: A dangling reference occurs when there is a reference to


storage that has been deallocate.

Concept of Static scope rule: It determines the declaration that applies to a name by examining
the Program text alone.

Concept of Dynamic scope rule: It determines the declaration applicable to a name at run time
by considering the current activations.

Heap Management: Heap is the unused memory space available for allocation dynamically. It
is used for data that lives indefinitely, and is freed only explicitly. The existence of such data is
independent of the procedure that created it.

Memory manager is used to keep account of the free space available in the heap area.
Functions include: Allocation, and Deallocation.

Concept of Symbol table: The information’s are entered into the symbol table is- The string of
characters denoting the name, Attributes of the name, Parameters. Offset for the name.

Definition of Symbol table: Symbol table is a data structure that contains all variables in the
program and Temporary storage and any information needed to reference or allocate storage for
them.

Data Structures for Symbol Table


Lists, Self-organizing lists, Search tree, Hash table.

Concept of Programming languages with block structures: Those languages in which some
identifiers only exist inside some sections of the code, and not in others. For instance, Algol,
PL/I, C, Java, Pascal.

Concept of Non block structured language, block structured language: Compiler must carry
out the storage allocation and provide access to variables and data. Allocation can be done in two
ways.

UNIT - V
Code Generation: Issues in the design of a code generator, The Target language, Basic blocks
and flow graphs, optimization of basic blocks, a simple code generator, register allocation and
assignment, optimal code generation for expressions, dynamic programming code generation.
Code Optimization: Introduction, where and how to optimize, principle source of
optimization, function preserving transformations, loop optimizations, global flow analysis,
machine dependent optimization

Code Generation
Code Generator(CG) is the final phase in the compiler model. The input to CG is the
intermediate representation (IR) produced by the front end of the compiler along with required
symbol table information. The output of a CG is a semantically equivalent target program. Code
generation and code optimization phases are referred as backend of the compiler.

Main tasks of CG are: instruction selection, register allocation of assignment, and instruction
ordering. The output of a CG is object code which can take following forms: Absolute code,
relocatable machine code or assembly language.

Absolute machine-code can be placed in a fixed location in memory and immediately executed.
Relocatable machine-language program allows subprograms to be compiled separately. By
using a linking loader, relocatable object modules can be linked together and it can be loaded for
execution. Assembly-language program allows the process of CG somewhat easier.

Issues in the design of code generation


The issues involved in the design of CG include tasks such as: instruction handling, instruction
selection, register allocation & assignment. The utmost important work of CG is to produce
correct code.

Measuring the quality of object program


Concept: The quality of an object program is measured by its Size or its running time. For
large computation running time is particularly important. For small computations size may be as
important or even more.

Principle sources of optimization


Concept: Code optimization techniques are generally applied after syntax analysis, usually both
before and during code generation.

Definition: The techniques consist of detecting patterns in the program and replacing these
patterns by equivalent and more efficient constructs.

Patterns used for code optimization


Concept: The patterns may be local or global and replacement strategy may be a machine
dependent or independent
The important sources of optimization are the identification of common sub expression.
Local optimization
Definition: The optimization performed within a block of code is called a local optimization.

Constant folding
Definition: Deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding.

Inner loops
Definition: The most heavily traveled parts of a program, the inner loops, are an obvious target
for optimization. Typical loop optimizations are the removal of loop invariant computations and
the elimination of induction variables.

Code motion
Definition: Code motion is an important modification that decreases the amount of code in a
loop.

Properties of optimizing compilers


Concept: Transformation must preserve the meaning of programs. Transformation must, on the
average, speed up the programs by a measurable amount. A Transformation must be worth the
effort. The accurate term for “Code Optimization” is Code Improvement.

The code optimization techniques consist of detecting patterns in the program and replacing
these patterns.
Local transformation & Global Transformation
Definition: A transformation of a program is called Local, if it can be performed by looking only
at the statements in a basic block otherwise it is called global.

Examples for function preserving transformations


• Common sub-expression elimination
• Copy propagation
• Dead – code elimination
• Constant folding

Common Sub-expressions
Definition: An occurrence of an expression E is called a common sub-expression, if E was
previously computed, and the values of variables in E have not changed since the previous
computation.

Dead Code
Definition: A variable is live at a point in a program if its value can be used subsequently
otherwise, it is dead at that point. The statement that computes values that never get used is
known Dead code or useless code.

Techniques used for loop optimization


i) Code motion
ii) Induction variable elimination
iii) Reduction in strength

Reduction in strength
Definition: Reduction in strength is the one which replaces an expensive operation by a cheaper
one such as a multiplication by an addition.

Loop invariant computation


Definition: An expression that yields the same result independent of the number of times the
loop is executed is known as loop invariant computation.

Data flow equations


Definition: A typical equation has the form
Out[S] = gen[S] U (In[S] – kill[S])
and can be read as, “ the information at the end of a statement is either generated within
the statement, or enters at the beginning and is not killed as control flows through the
statement”. Such equations are called data flow equations.

Dangling reference
Concept: A dangling reference occurs when there is storage that has been deallocated.

Definition: It is logical error to use dangling references, since the value of deallocated storage is
undefined according to the semantics of most languages.

You might also like