Complier construction (Final)

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Compiler:

A complier is a software tool or platform that is used to translate the high-level language programming
source code into low level machine language code that can be executed directly by the computer’s hardware.
 A compiler is a specific translates that translate information form one representation to another.
 Applications that convert for example word file into PDF is called translators not a compiler.
Issues in compilation:
There is no algorithm that exist for an ideal translation. Translation is a complex process. To manage this
complex process, the translation is done in multiple process.
Types of compilers:
Compliers can be translated into two main categories:
1. Single pass complier – in this type of complier the source code is process in single go that means
the compiler read source code and do the necessary analysis and generate the target code in one go.
2. Multi pass compiler – when several intermediate codes are created in a program and the parse tree
is processed many times it is called mutli-pass complier. It breaks program or code into smaller
program.
Type of multi pass compiler:
Multi pass complier can be further divided into two categories:
1. Two-pass compiler
2. Three-pass complier
Two-pass compiler:
In this type of compilation, where the program is translated twice once form the front end and second from
the back end.

Front end:
The algorithm employed in the front end have the polynomial time complexity. The front end maps the legal
source code into intermediate code representative.
Phases of Front end:
Front end consists of the following phases:
 Lexical analysis
 Syntax analysis / parser
 Semantic analysis
 Intermediate code generator

Back End:
The back end has the time complexity of NP complete and maps IR into target machine code. The back end
of the compiler translates the intermediate code representation (IR) into target machine code. It decides
which value to keep in the register in order to avoid the memory access. It is also responsible for instruction
selection to produce fast and compact code.
Intermediate Code Representation Steps:
An intermediate code representation consists of following steps:
1. Instruction selection
2. Register allocation
3. Instruction scheduling
Phase of Back end:
Back end consists of following phases:
 Code optimization
 Target code generator
Lexical analysis:
The scanner is the first component of the front end and parser is the second component. The task of the
scanner is to take a program (source code) that is written in some language such as java, c++ etc as stream of
character and break the stream of characters into tokens. This activity is knowns lexical analysis. The lexical
analyser partition input string into substring called word. And classify them according to their role. The
output of lexical analysis is a sequence of tokens that is then given to the parser as input.
Specifying token:
We don not what kind of token we are going to read after reading the first character such as if a token starts
with i it can be either an identifier or a keyword. Regular language is the most popular for specifying the
token because:
 They are base on simple and useful theory
 Are easy to understand
 And have efficient implementation
Parser:
The second phase of front end is also known as syntax analysis. It takes the sequence of token form the
previous phase i.e., lexical analysis as input and recognize the context free grammar and converts it into
intermediate representation (IR). If there are any errors it will also reports the errors.
Context Free Grammar (CFG):
The syntax of most programming language is specified or defined by using context free grammar. A context
free grammar consists of following:
 S – start symbol
 N – Non – terminals
 T – Terminals
 P – set of production rule
Terminals: The symbol that can’t be replaced by any symbol is called terminal (constant). They are denoted
by small letters. (a,b,c)
Non- Terminals: The symbol that must be replaced by other things is called non terminal (variable). They
are donated by capital letter. (X, S, Y)
Productions: The grammatical rules are often called Productions.
Parse representation:
A parse can be represented by using:
 Parse tree
 Syntax tree
 Abstract Syntax tree
These representations help in understanding the structure of the source code.
Prase tree:
It is the hierarchical representation of the terminal or non-terminals. This is also known as derivation tree. A
parse tree is created by a parser.
Types of parsing:
There are two types of parsing which have further categories:
 Bottom-up parsing
 Top-down parsing

Top-down parsing:
A top-down parse tree starts at the root and grows towards the leaves. At each node, parser picks the
production rule and tried to match it with the input string. In simple term, it starts from the start symbol of
the grammar and reaching the input string.
Types:
Top down is further divided into two categories:
 Recursive descent parsing
 Predictive Parsing (LL (1))
Recursive descent parsing:
It is a parsing technique where each non-terminal symbol in a grammar is associated with a procedure or
function. These procedures recursively call each other and correspondingly match the non-terminal symbols
in the input based on the grammar rules.
First ():
First(A) contains all the terminals present in the first place of every string derived by A.
Note:
 First (Terminal) = terminal
 First (Epsilon) = Epsilon(e)
Follow ():
Follow (A) contains set of all terminals present immediate in right of A.
Note:
Follow of start symbol is $.
LL (1):
A top-down parser that uses a one-token lookahead is called an LL (1) parser. The first L indicates that the
input is read from left to right. The second L says that it produces a left-to-right derivation.
Bottom-up parsing:
The bottom-up parsing start at the leave nodes and grow towards the root node of the parse tree. It handles a
large class of grammar. It is also knowns as shift reduce parser This process involves shifting input symbols
onto the stack and then reducing them based on the grammar rules, hence the name "shift-reduce" parsing.
Actions:
Bottom-up parsing uses two types of actions:
Shift Action – moves one place to the right which shifts the terminal to the left string.
Reduce Action – applies the inverse production
Handles:
A handle is a substring of symbols on the stack that matches the right-hand side of a production rule.
Handles guide the parser in deciding when and how to reduce the stack. They represent points where the
parser recognizes that a part of the input matches a production rule's right-hand side. Each handle
corresponds to a step in the derivation of the input according to the grammar rules. If the right-hand side
production has k symbol, it has k+1 placeholder positions. The number of potential handles for the grammar
is simply the sum of the lengths of the right-hand side of all the productions. The number of complete
handles is simply the number of productions.
Types:
Bottom-up parsing can be further divided into sub categories:
 Operator precedence parsing
 LR parsing
Operator precedence parsing:
Operator precedence grammar is kinds of shift reduce parsing method. It is applied to a small class of
operator grammars. A grammar is said to be operator precedence grammar if it has two properties:
 No R.H.S. of any production has a∈.
 No two non-terminals are adjacent.
Operator precedence can only be established between the terminals of the grammar. It ignores the non-
terminal.
Operator precedence relations:
There are the three operator precedence relations:
1. a ⋗ b means that terminal "a" has the higher precedence than terminal "b".
2. a ⋖ b means that terminal "a" has the lower precedence than terminal "b".
3. a ≐ b means that the terminal "a" and "b" both have same precedence.
LR (1) parsing:
The term parser LR(k) parser, here the L refers to the left-to-right scanning, R refers to the rightmost
derivation in reverse and k refers to the number of unconsumed “look ahead” input symbols that are used in
making parser decisions. Typically, k is 1 and is often omitted. The LR (1) parsers can recognize precisely
those languages in which one symbol lookahead suffices to determine whether to shift or reduce. The LR (1)
construction algorithm builds a handle-recognizing DFA. The parsing algorithm uses this DFA to recognize
handles and potential handles on the parse stack
Types of LR (k) parsing:
Following are the LR (k) types:
 LR (0)
 SLR (1) (Simple LR)
 LALR (Look Ahead LR)
 CLR (Conical LR)
LR (0):
Also known as Canonical LR parsing, which uses zero lookahead symbols.
SLR (1):
Simple LR parsing, which uses one lookahead symbol and is simpler than LR (1).
LALR (Look Ahead LR):
LALR (1) parsing is a type of LR parsing, which is a bottom-up parsing technique used for parsing context-
free grammars. LALR (1) uses one symbol of lookahead to determine parsing actions (shift or reduce). This
lookahead allows the parser to make decisions based on the next token in the input stream. LALR (1)
reduces the number of states compared to LR (1) by merging states with identical core LR (0) sets and
lookahead sets wherever possible.
CLR (Conical LR):
CLR parsing builds a canonical collection of LR (1) items, which are extensions of LR (0) items (production
rules with a dot indicating position) with one symbol of lookahead.
Parsing Table:
Parsing table is divided into two parts- Action table and Go-To table. The action table gives a grammar rule
to implement it to the given current state and current terminal in the input stream. The go-to table indicates
which state should proceed.
Attribute Grammer:
Attribute grammars are an extension of context-free grammars (CFGs) used in compiler theory to associate
attributes with symbols in the grammar. These attributes help in semantic analysis and code generation
phases of compilers.
Types:
Attributes in a grammar can be of two types based on how their values are computed.

 Synthesized Attribute
 Inherited Attributes

Synthesized Attribute:
These attributes' values are derived from the attributes of the node's children and possibly constants.
Flow: Computed bottom-up in the parse tree.
Inherited Attributes:
These attributes' values are determined by the node's own attributes, its siblings, and its parent.
Flow: Propagated top-down and laterally in the parse tree.
Evaluation Methods:
There are several methods to evaluate attributes in a grammar:

 Dynamic method
 Ruled based (Tree Walk) method
 Oblivious (Data Flow) method

Dynamic method:
Builds the parse tree, constructs the attribute dependence graph, performs topological sorting, and evaluates
attributes in order.
Usage: Commonly used when attributes depend on complex interactions across the tree.
Ruled based (Tree Walk) method:
Analysis attribute rules during compiler-generation time, establishes a fixed evaluation order, and evaluates
nodes accordingly.
Usage: Efficient for grammars with straightforward attribute dependencies.
Oblivious (Data Flow) method:
Ignores attribute rules and parse tree structure, uses a predetermined evaluation order set during compiler \
Usage: Provides a simplified approach for attribute evaluation, suitable for less complex attribute
dependencies.
Ad-Hoc Analysis and Usage:
A variant of attribute grammars where ad-hoc techniques are used to handle context-sensitive analysis. Often
integrated into parsers to execute actions associated with grammar productions dynamically.
Intermediate Representation:
An Intermediate Representation (IR) serves as an internal form used to analyse and translate code across
different phases of compilation. It facilitates efficient processing and manipulation of code semantics.
Purpose and need for IR:
Compilers are structured into multiple passes; each requiring a structured format to store and process code.
The IR captures essential information beyond what is explicitly stated in the source code, such as variable
addresses and procedural relationships.
Types of IR’s:
IRs are categorized into three main types based on their organizational structure and usage:

 Graphical IR’s
 Linear IR’s
 Hybrid IR’s

Graphical IR’s:
Graphical IR’s encode compiler knowledge in graph structures. Typically used in parsing and attribute
grammar systems, where the syntax and structure of the source code are directly represented.
Example:
Parse trees and Abstract Syntax Trees (ASTs) are primary forms, with ASTs being more concise by
eliminating non-essential nodes.
Parse Trees: Graphical representation that mirrors the syntactic structure of source code, used primarily
during parsing.
Abstract Syntax Trees (ASTs): Condensed form of parse trees, retaining essential structure while omitting
unnecessary details like derivation paths.
Directed Acyclic Graphs (DAGs): Optimized form of ASTs, eliminating redundancy by sharing common
subexpressions.
Linear IR’s:
Linear IR’s represent code as a sequence of instructions resembling pseudo-code for an abstract machine. It
is common in compilers generating assembly code or low-level machine instructions.
Example:
Three-address code and stack-machine code are example of linear IRs, offering simplicity and direct
execution models.
Three-Address Code: Each statement contains at most three operands (e.g., x = y op z), offering simplicity
and compactness.
Syntax Directed Translation:
This scheme uses Three address code for various programming constructs.
Stack-Machine Code: Operates using an operand stack, where operations pop operands, perform
computations, and push results, optimizing for space and execution speed.
Hybrid IR’s:
Hybrid IR’s combines elements of both graphical and linear IRs to leverage their respective advantages.
Used in compilers where a balance between graphical representation and sequential execution is beneficial.
Example:
Can include structures that mix graph-like relationships with linear execution models for specific
optimization purposes.

You might also like