Professional Documents
Culture Documents
Compiler Construction Complete PDF
Compiler Construction Complete PDF
Compiler Construction Complete PDF
Compiler ConsTruCTion
1: Compiler Techniques and Methodology
What is a Compiler?
A compiler is a computer program which helps you
transform source code written in a high-level language into low-level
machine language. It translates the code written in one programming
language to some other language without changing the meaning of the
code.
Features of Compilers:
1. Correctness
2. Speed of compilation
3. The speed of the target code
4. Code debugging help
Types of Compiler:
Following are the different types of Compiler:
1) Single Pass Compilers
2) Two Pass Compilers
3) Multipass Compilers
{HT}
In single pass Compiler source code directly transforms into machine code. For
example, Pascal language.
Multipass Compilers:
The multipass compiler processes the source code or syntax tree of a program
several times. It divided a large program into multiple small programs and
process them. It develops multiple intermediate codes. All of these multipass
take the output of the previous phase as an input. So it requires less memory.
It is also known as ‘Wide Compiler’.
2: Organization of Compilers
The organization of
compilers in compiler construction involves breaking down the compiler
into several distinct phases or components, each responsible for specific
tasks in the process of translating a high-level programming language into
machine code or an intermediate representation. The traditional
organization of compilers follows a structure known as the "compiler
front end" and "compiler back end."
Structure of a compiler:
Any large software is easier
to understand and implement if it is divided into well-defined modules.
{HT}
Front End:
Optimization:
{HT}
The compiler performs various optimizations on the intermediate
code to improve the efficiency of the generated machine code.
Back End:
Additional Considerations:
2. Components:
Lexer/Tokenizer: This is the component responsible for
scanning the source code and identifying the tokens.
Regular Expressions: These rules define the patterns for
different types of tokens.
4. Output:
The output of lexical analysis is a stream of tokens that serves
as input for the subsequent phases of the compiler.
Syntax Analysis:
1. Purpose:
Grammar Verification: Syntax analysis checks whether the
sequence of tokens generated by the lexical analysis conforms
to the grammatical structure of the programming language.
{HT}
AST Construction: It builds a hierarchical structure called the
Abstract Syntax Tree (AST) that represents the syntactic
structure of the program.
2. Components:
Parser: The parser is responsible for analyzing the
arrangement of tokens and ensuring that it follows the syntax
rules of the language.
Context-Free Grammar (CFG): Syntax rules are often
specified using CFG, which describes the syntactic structure
of the language.
Error Handling: The syntax analysis phase detects and
reports syntax errors.
4. Output:
The output of syntax analysis is the AST, which serves as the
basis for subsequent phases like semantic analysis,
optimization, and code generation.
Example:
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:
{HT}
Special Symbols:
Most of the high-level languages contain some special symbols, as shown
below:
Name Symbols
Punctuation Comma(,), Semicolon(:)
Assignment =
Special Assignment +=, -=, *=, /=
Comparison ==, ≠, <, >, ≤, ≥
Preprocessor #
Location Specifier &
Logical &&, |, ||, !
Shift Operator >>, <<, >>>, <<<
4: Parsing Techniques
The process of transforming
the data from one format to another is called Parsing. This process can be
accomplished by the parser. The parser is a component of the translator
that helps to organize linear text structure following the set of defined
rules which is known as grammar.
Types of Parsing:
{HT}
There are two types of Parsing:
1) The Top-down Parsing
2) The Bottom-up Parsing
Top-down Parsing:
When the parser generates a parse
with top-down expansion to the first trace, the left-most derivation of
input is called top-down parsing. The top-down parsing initiates with the
start symbol and ends on the terminals. Such parsing is also known as
predictive parsing.
Bottom-up Parsing:
The bottom-up parsing works
just the reverse of the top-down parsing. It first traces the rightmost
derivation of the input until it reaches the start symbol.
Shift-Reduce Parsing:
Shift-reduce parsing works
on two steps: Shift step and Reduce step.
a. Shift step:
The shift step indicates the increment of the input pointer to the
next input symbol that is shifted.
b. Reduce Step:
When the parser has a complete grammar rule on the right-hand
side and replaces it with RHS.
LR Parsing:
LR parser is one of the
most efficient syntax analysis techniques as it works with context-free
{HT}
grammar. In LR parsing L stands for the left to right tracing, and R stands
for the right to left tracing.
// C program
int x = 10;
int y = 20;
int z = x + y;
(i) A = 2*(22.0/7.0)*r
Perform 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
2. Variable Propagation:
//Before Optimization
c=a*b
x=a
till
d=x*b+4
//After Optimization
c=a*b
x=a
till
d=a*b+4
3. Constant Propagation:
If the value of a variable
is a constant, then replace the variable with the constant. The variable may
not always be a constant.
Example:
(i) A = 2*(22.0/7.0)*r
Performs 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
{HT}
Evaluates x/2.3 as 12.4/2.3 at compile time.
4. Copy Propagation:
It is extension of constant
propagation. It helps in reducing the compile time as it reduces copying.
Example:
//Before Optimization
c=a*b
x=a
till
d=x*b+4
//After Optimization
c=a*b
x=a
till
d=a*b+4
Example:
//Before Optimization
c=a*b
x=a
till
d=a*b+4
{HT}
//After elimination:
c=a*b
till
d=a*b+4
7. Function Cloning:
Here, specialized codes for
a function are created for different calling parameters.
1. Error Detection:
Lexical Errors:
Definition: Lexical errors involve invalid characters or token
sequences.
Detection: Lexical analyzers (scanners) examine the source
code and identify errors by recognizing characters that do not
form valid tokens or violate lexical rules.
Syntax Errors:
Definition: Syntax errors occur when the input source code
violates the grammar rules of the programming language.
Detection: Syntax analyzers (parsers) detect these errors
during the parsing phase by analyzing the structure of the code.
Semantic Errors:
Definition: Semantic errors involve violations of the
language's semantics, such as using a variable before it is
declared.
Detection: Semantic analysis identifies these errors during
the semantic analysis phase.
4. User-Defined Errors:
Definition: Compilers may allow programmers to define custom
error-handling routines or specify error-handling behavior.
Purpose: Provides flexibility in handling errors based on the
specific requirements of a programming project.
Compiler:
The Compiler is a translator which takes input i.e., High-Level
Language, and produces an output of low-level language i.e. machine or
assembly language. The work of a Compiler is to transform the codes
written in the programming language into machine code (format of 0s and
1s) so that computers can understand.
A compiler is more intelligent than an assembler it checks all kinds
of limits, ranges, errors, etc.
But its program run time is more and occupies a larger part of
memory.
{HT}
Advantages of Compiler:
Compiled code runs faster in comparison to Interpreted code.
Compilers help in improving the security of Applications.
Disadvantages of Compiler:
The compiler can catch only syntax errors and some semantic errors.
Compilation can take more time in the case of bulky code.
Interpreter:
An Interpreter is a program that translates a programming language
into a comprehensible language. The interpreter converts high-level
language to an intermediate language. It contains pre-compiled code,
source code, etc.
It translates only one statement of the program at a time.
Interpreters, more often than not are smaller than compilers.
Advantages of Interpreter:
Programs written in an Interpreted language are easier to debug.
Interpreted Language is more flexible than a Compiled language.
Disadvantages of Interpreter:
The interpreter can run only the corresponding Interpreted program.
Interpreted code runs slower in comparison to Compiled code.
{HT}