Compiler Construction Complete PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

{HT}

Compiler ConsTruCTion
1: Compiler Techniques and Methodology

What is a Compiler?
A compiler is a computer program which helps you
transform source code written in a high-level language into low-level
machine language. It translates the code written in one programming
language to some other language without changing the meaning of the
code.

Features of Compilers:
1. Correctness
2. Speed of compilation
3. The speed of the target code
4. Code debugging help

Types of Compiler:
Following are the different types of Compiler:
1) Single Pass Compilers
2) Two Pass Compilers
3) Multipass Compilers
{HT}

Single Pass Compiler:

In single pass Compiler source code directly transforms into machine code. For
example, Pascal language.

Two Pass Compiler:


Two pass Compiler is divided into two sections, viz.
1. Front end: It maps legal code into Intermediate Representation (IR).
2. Back end: It maps IR onto the target machine

Multipass Compilers:

The multipass compiler processes the source code or syntax tree of a program
several times. It divided a large program into multiple small programs and
process them. It develops multiple intermediate codes. All of these multipass
take the output of the previous phase as an input. So it requires less memory.
It is also known as ‘Wide Compiler’.

Steps for Language processing systems:


{HT}
Before knowing about the concept of compilers,
you first need to understand a few other tools which work with compilers.

Advantages of Compiler Design:


1. Efficiency:
3. Error Checking:
4. Optimizations:
Disadvantages of Compiler Design:
1. Longer Development Time:
2. Debugging Difficulties:
3. Platform-Specific Code:

Compiler Techniques and Methodology:


Compiler techniques and
methodology are the principles and practices that guide the design and
implementation of compilers.

Stages of Compiler Techniques and Methodology:


{HT}
 Scanning and parsing: These are the processes of analyzing
the syntax and structure of the source code and building an intermediate
representation, such as an abstract syntax tree, that captures its meaning.
 Semantic analysis: This is the process of checking the validity
and consistency of the source code, such as type checking, scope
checking, and name resolution.
 Code generation: This is the process of translating the intermediate
representation into executable code for the target machine or platform,
such as assembly language or bytecode.
 Optimization: This is the process of improving the quality and
performance of the executable code by applying various techniques, such
as data flow analysis, register allocation, instruction scheduling, and loop
transformation.

2: Organization of Compilers
The organization of
compilers in compiler construction involves breaking down the compiler
into several distinct phases or components, each responsible for specific
tasks in the process of translating a high-level programming language into
machine code or an intermediate representation. The traditional
organization of compilers follows a structure known as the "compiler
front end" and "compiler back end."

Structure of a compiler:
Any large software is easier
to understand and implement if it is divided into well-defined modules.
{HT}

Front End:

 Lexical Analysis (Scanner): This is the first phase, where the


source code is broken down into a sequence of tokens.
 Syntax Analysis (Parser): This phase checks whether the
sequence of tokens adheres to the grammatical structure of the
programming language.
 Semantic Analysis: This phase checks the meaning of the
statements and expressions in the program. It ensures that the
program follows the language's semantics and performs tasks like
type checking.

Intermediate Code Generation:

 After the front end, the compiler may generate an intermediate


representation (IR) of the program. The IR is an abstraction that
simplifies the source code while preserving its essential meaning.

Optimization:
{HT}
 The compiler performs various optimizations on the intermediate
code to improve the efficiency of the generated machine code.

Back End:

 Code Generation: In this phase, the compiler generates the target


machine code or assembly code from the optimized intermediate
code.
 Code Optimization (Machine-Dependent): This phase
optimizes the generated machine code for the specific target
architecture. It may include instruction scheduling, register
allocation, and other architecture-specific optimizations.
 Code Emission: The final step involves emitting the machine
code or generating an executable file from the optimized code.

Additional Considerations:

 Error Handling: Throughout the compilation process, compilers


must handle errors gracefully, providing meaningful error messages.
 Debugging Information: Compilers often include information in
the executable to aid in debugging, such as source code line numbers
or variable names.
 Cross-Compilation: Some compilers support generating code for
a different target architecture than the one on which the compiler
itself runs.

3: Lexical and Syntax Analysis


Lexical analysis and syntax
analysis are two crucial phases in the process of compiler construction.
They are responsible for analyzing the source code of a programming
language and converting it into a form that can be further processed by
the compiler.
{HT}
Lexical Analysis:
1. Purpose:
 Tokenization: The main goal of lexical analysis is to break
down the source code into a sequence of tokens. Tokens are
the smallest units of meaning in a programming language,
such as keywords, identifiers, literals, and operators.

2. Components:
 Lexer/Tokenizer: This is the component responsible for
scanning the source code and identifying the tokens.
 Regular Expressions: These rules define the patterns for
different types of tokens.

3. Steps in Lexical Analysis:


 Scanning: The lexer scans the source code character by
character.
 Token Recognition: It recognizes and categorizes sequences
of characters into tokens based on predefined rules.
 Error Handling: Lexical analysis also involves detecting and
reporting lexical errors, such as invalid characters or tokens.

4. Output:
 The output of lexical analysis is a stream of tokens that serves
as input for the subsequent phases of the compiler.

Syntax Analysis:
1. Purpose:
 Grammar Verification: Syntax analysis checks whether the
sequence of tokens generated by the lexical analysis conforms
to the grammatical structure of the programming language.
{HT}
 AST Construction: It builds a hierarchical structure called the
Abstract Syntax Tree (AST) that represents the syntactic
structure of the program.

2. Components:
 Parser: The parser is responsible for analyzing the
arrangement of tokens and ensuring that it follows the syntax
rules of the language.
 Context-Free Grammar (CFG): Syntax rules are often
specified using CFG, which describes the syntactic structure
of the language.
 Error Handling: The syntax analysis phase detects and
reports syntax errors.

3. Steps in Syntax Analysis:


 Parsing: The parser processes the stream of tokens generated
by the lexer and checks whether it conforms to the language's
syntax rules.
 Error Reporting: Syntax analysis also involves reporting
detailed error messages when syntax errors are encountered.

4. Output:
 The output of syntax analysis is the AST, which serves as the
basis for subsequent phases like semantic analysis,
optimization, and code generation.

Example:
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:
{HT}

Special Symbols:
Most of the high-level languages contain some special symbols, as shown
below:
Name Symbols
Punctuation Comma(,), Semicolon(:)
Assignment =
Special Assignment +=, -=, *=, /=
Comparison ==, ≠, <, >, ≤, ≥
Preprocessor #
Location Specifier &
Logical &&, |, ||, !
Shift Operator >>, <<, >>>, <<<

Now we will understand with proper code of C++.


#include <iostream>
int maximum (int x, int y) {
// This will compare two numbers
if (y > x)
return y;
else {
{HT}
return x;
}
}

4: Parsing Techniques
The process of transforming
the data from one format to another is called Parsing. This process can be
accomplished by the parser. The parser is a component of the translator
that helps to organize linear text structure following the set of defined
rules which is known as grammar.

The process of Parsing:

Types of Parsing:
{HT}
There are two types of Parsing:
1) The Top-down Parsing
2) The Bottom-up Parsing

Top-down Parsing:
When the parser generates a parse
with top-down expansion to the first trace, the left-most derivation of
input is called top-down parsing. The top-down parsing initiates with the
start symbol and ends on the terminals. Such parsing is also known as
predictive parsing.

 Recursive Descent Parsing: Recursive descent parsing is a type


of top-down parsing technique. This technique follows the process
for every terminal and non-terminal entity. It reads the input from
left to right and constructs the parse tree from right to left.
{HT}
 Back-tracking: The parsing technique that starts from the initial
pointer, the root node. If the derivation fails, then it restarts the
process with different rules.

Bottom-up Parsing:
The bottom-up parsing works
just the reverse of the top-down parsing. It first traces the rightmost
derivation of the input until it reaches the start symbol.

Shift-Reduce Parsing:
Shift-reduce parsing works
on two steps: Shift step and Reduce step.
a. Shift step:
The shift step indicates the increment of the input pointer to the
next input symbol that is shifted.
b. Reduce Step:
When the parser has a complete grammar rule on the right-hand
side and replaces it with RHS.

LR Parsing:
LR parser is one of the
most efficient syntax analysis techniques as it works with context-free
{HT}
grammar. In LR parsing L stands for the left to right tracing, and R stands
for the right to left tracing.

Why is parsing useful in compiler designing?


In the world of software,
every different entity has its criteria for the data to be processed. So
parsing is the process that transforms the data in such a way so that it can
be understood by any specific software.

The Technologies Use Parsers:


 The programming languages like Java.
 The database languages like SQL.
 The protocols like HTTP.
 The XML and HTML.

5: Object Code Generation and Optimization


Object code generation
and optimization are crucial phases in the process of compiler
construction. These phases are responsible for translating high-level
programming languages into machine code that can be executed by a
computer's hardware efficiently.

Code generation and optimization involves several


stages:
1. Intermediate Code Generation: The front-end of the compiler
generates an intermediate representation of the source code.
2. Intermediate Code Optimization: Some compilers perform
initial optimization on the intermediate code before generating the
object code.
3. Object Code Generation: The optimized intermediate code is
translated into machine code or assembly language.
4. Final Code Optimization: Further optimizations are applied to
the generated object code to improve performance.
{HT}
Example of object code generation and optimization for a C
program:

// C program
int x = 10;
int y = 20;
int z = x + y;

// Intermediate code (three-address code)


t1 = 10
t2 = 20
t3 = t1 + t2
x = t1
y = t2
z = t3

// Object code (x86 assembly)


mov eax, 10; t1 = 10
mov ebx, 20; t2 = 20
add eax, ebx; t3 = t1 + t2
mov [x], eax; x = t1
mov [y], ebx; y = t2
mov [z], eax; z = t3

// Optimized object code (x86 assembly)


mov eax, 10; x = 10
mov ebx, 20; y = 20
add eax, ebx; z = x + y
mov [x], eax; store x
mov [y], ebx; store y
mov [z], eax; store z

Code Optimization is done in the following different


ways:
{HT}

1. Compile Time Evaluation:

(i) A = 2*(22.0/7.0)*r
Perform 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3

Evaluate x/2.3 as 12.4/2.3 at compile time.

2. Variable Propagation:

//Before Optimization
c=a*b
x=a
till
d=x*b+4

//After Optimization
c=a*b
x=a
till
d=a*b+4

3. Constant Propagation:
If the value of a variable
is a constant, then replace the variable with the constant. The variable may
not always be a constant.

Example:
(i) A = 2*(22.0/7.0)*r
Performs 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
{HT}
Evaluates x/2.3 as 12.4/2.3 at compile time.

4. Copy Propagation:
It is extension of constant
propagation. It helps in reducing the compile time as it reduces copying.

Example:
//Before Optimization
c=a*b
x=a
till
d=x*b+4

//After Optimization
c=a*b
x=a
till
d=a*b+4

5. Common Sub Expression Elimination:


In the above example,
a*b and x*b is a common sub expression.

6. Dead Code Elimination:


Copy propagation often
leads to making assignment statements into dead code.

Example:
//Before Optimization
c=a*b
x=a
till
d=a*b+4
{HT}
//After elimination:
c=a*b
till
d=a*b+4

7. Function Cloning:
Here, specialized codes for
a function are created for different calling parameters.

Example: Function Overloading

7: Detection and Recovery from Errors


In compiler construction,
error detection and recovery mechanisms play a crucial role in ensuring
that a compiler can handle erroneous input and produce meaningful
output. Errors can occur at various stages of the compilation process, such
as lexical analysis, syntax analysis, semantic analysis, and code
generation.
{HT}
Error Detection and Recovery in Compiler Construction:

1. Error Detection:

 Lexical Errors:
 Definition: Lexical errors involve invalid characters or token
sequences.
 Detection: Lexical analyzers (scanners) examine the source
code and identify errors by recognizing characters that do not
form valid tokens or violate lexical rules.

 Syntax Errors:
 Definition: Syntax errors occur when the input source code
violates the grammar rules of the programming language.
 Detection: Syntax analyzers (parsers) detect these errors
during the parsing phase by analyzing the structure of the code.

 Semantic Errors:
 Definition: Semantic errors involve violations of the
language's semantics, such as using a variable before it is
declared.
 Detection: Semantic analysis identifies these errors during
the semantic analysis phase.

2. Panic Mode Recovery:


 Definition: Panic mode recovery involves discarding tokens until
a synchronizing token is found.
 Purpose: It helps the compiler recover from a syntax error and
continue parsing the source code.

3. Code Generation and Optimization Errors:


{HT}
 Definition: Errors in later stages may involve incorrect translations
or inefficient code generation.
 Handling: The compiler detects and reports these errors to ensure
the generation of correct and optimized machine code.

4. User-Defined Errors:
 Definition: Compilers may allow programmers to define custom
error-handling routines or specify error-handling behavior.
 Purpose: Provides flexibility in handling errors based on the
specific requirements of a programming project.

8: Contrast between Compilers and Interpreters

Compiler:
The Compiler is a translator which takes input i.e., High-Level
Language, and produces an output of low-level language i.e. machine or
assembly language. The work of a Compiler is to transform the codes
written in the programming language into machine code (format of 0s and
1s) so that computers can understand.
 A compiler is more intelligent than an assembler it checks all kinds
of limits, ranges, errors, etc.
 But its program run time is more and occupies a larger part of
memory.
{HT}
Advantages of Compiler:
 Compiled code runs faster in comparison to Interpreted code.
 Compilers help in improving the security of Applications.

Disadvantages of Compiler:
 The compiler can catch only syntax errors and some semantic errors.
 Compilation can take more time in the case of bulky code.

Interpreter:
An Interpreter is a program that translates a programming language
into a comprehensible language. The interpreter converts high-level
language to an intermediate language. It contains pre-compiled code,
source code, etc.
 It translates only one statement of the program at a time.
 Interpreters, more often than not are smaller than compilers.

Advantages of Interpreter:
 Programs written in an Interpreted language are easier to debug.
 Interpreted Language is more flexible than a Compiled language.

Disadvantages of Interpreter:
 The interpreter can run only the corresponding Interpreted program.
 Interpreted code runs slower in comparison to Compiled code.
{HT}

You might also like