Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Chapter-1 Overview of the Compiler and its Structure:

Language Processor:
Nowadays, most programs are written in a high-level language such as C, Java, or Python. These
languages are designed more for people, rather than machines, by hiding some hardware details of
a specific computer from the programmer.
Simply put, high-level languages simplify the job of telling a computer what to do. However, since
computers only understand instructions in machine code (in the form of 1's and 0's), we can not
properly communicate with them without some sort of a translator.This is why language processors
exist.
The language processor is a special translator system used to turn a program written in a high-level
language, which we call "source code", into machine code, which we call "object program" or
"object code".
The computer is built as a combination of hardware and software. The hardware cannot understand
a human-readable language, or mainly, the code is written in the high-level language. So, in order
to understand this code by machines, it goes through multiple transformations, and this is when
the language processing system comes into play. It plays an important role in compiler design.
Whenever we write code in any high-level language, the high-level language goes through
different phases and converts into binary language. Converting a high-level language into a low-
level language takes multiple steps and involves many programs apart from the Compiler. Before
the compilation can start, our source code needs to be preprocessed. After the compilation, our
code needs to be converted into executable code to execute on our machine. These essential tasks
are performed by the preprocessor, assembler, Linker, and Loader. They are known as the
Cousins of the Compiler.

CHAPTER-1 1
1. A user writes code in a high-level language like c.
2. Pre-Processor will remove all the #<include> by file inclusion and #<define> by macro
expansion.
3. The compiler of high-level language converts source code into assembly language. Assembly
Language is neither completely in the form of 0s and 1s nor in a high-level language. An
assembly language is an intermediate between source code and machine language.
4. Then assembler converts the code that is compiled before into relocatable machine code.
5. The linker links all the files and loader puts all the executable program together and load into
memory.

The Preprocessor is one of the cousins of the Compiler. The output generated by preprocessor
is used as an input for some other program. The preprocessor increases the readability of the code
by replacing a complex expression with a simpler one by using a macro. A preprocessor performs
multiple types of functionality and operations on the data like Macro processing, File Inclusion,
Language extension etc...
Linker and loader is the component of the operating system.

The linker is a program that combines different object files into a single executable file. The main
task of the linker is to search and locate the reference in the main memory when an executable
code will be loaded. Large programs are often compiled in pieces, so the relocatable machine code
may have to be linked together with other relocatable object files and library files into the code
that actually runs on the machine. The linker resolves external memory addresses, where the code
in one file may refer to a location in another file.

Then loader loads all the programs into the main memory and executes them.
In order to design a language processor, a very precise description of lexicon and syntax, as well
as semantics of a high-level language, is needed.

Types of Language Processor:


There are three types of language processors: Assembler, Interpreter, Compiler.

Compiler is a program that reads a program written in source language and translates it into an
equivalent program in target language. In a compiler, the source code is translated to object code
successfully if it is free of errors. The compiler specifies the errors at the end of the compilation
with line numbers when there are any errors in the source code. The errors must be removed before
the compiler can successfully recompile the source code again

Interpreter is also a program that reads a program written in source language and translates it
into an equivalent program in target language line by line. If there is an error in the statement, the
interpreter terminates its translating process at that statement and displays an error message. The
interpreter moves on to the next line for execution only after the removal of the error. An
Interpreter directly executes instructions written in a programming or scripting language without
previously converting them to an object code or machine code.

CHAPTER-1 2
The Assembler is used to translate the program written in Assembly language into machine code.
The source program is an input of an assembler that contains assembly language instructions. The
output generated by the assembler is the object code or machine code understandable by the
computer. Assembler is basically the 1st interface that is able to communicate humans with the
machine. We need an Assembler to fill the gap between human and machine so that they can
communicate with each other. code written in assembly language is some sort of
mnemonics(instructions) like ADD, MUL, MUX, SUB, DIV, MOV and so on. and the assembler
is basically able to convert these mnemonics in Binary code. Here, these mnemonics also depend
upon the architecture of the machine.

Compiler v/s Interpreter:


Compiler Interpreter
A compiler scans the entire program and the interpreter takes a single line of code and
translates it as a whole into machine code translates it into machine code.
It requires a lot of time to analyze the source It requires very little time to analyze the source
code code
Compiled code runs faster Interpreted code runs slower
Compiler displays all errors after compilation. Interpreter displays errors of each line one by
If your code has mistakes, it will not compile. one.
It generates intermediate code. It does not generate intermediate code.
Object code is saved for future use. No object code is saved for future use.
Python, Ruby, Perl, SNOBOL, MATLAB, etc
C, C++, C#, etc are programming languages
are programming languages that are
that are compiler-based.
interpreter-based.

CHAPTER-1 3
Phases of Compiler:

A compiler can broadly be divided into two phases based on the way they compile.

 Analysis Phase
Known as the front-end of the compiler, the analysis phase of the compiler reads the source
program, divides it into core parts and then checks for lexical, grammar and syntax errors.The
analysis phase generates an intermediate representation of the source program and symbol
table, which should be fed to the Synthesis phase as input.
 Synthesis Phase
Known as the back-end of the compiler, the synthesis phase generates the target program with
the help of intermediate source code representation and symbol table.
A compiler can have many phases and passes.
 Pass : A pass refers to the traversal of a compiler through the entire program.
 Phase : A phase of a compiler is a distinguishable stage, which takes input from the previous
stage, processes and yields output that can be used as input for the next stage. A pass can have
more than one phase.

CHAPTER-1 4
 The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of source program, and feeds its output to the next
phase of the compiler. Let us understand the phases of a compiler.

Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code as input. It
reads the source program one character at a time and converts it into meaningful lexemes. Lexical
analyzer represents these lexemes in the form of tokens. A token is basically the arrangement of
characters that defines a unit of information in the source code.
In computer science, a program that executes the process of lexical analysis is called a scanner,
tokenizer, or lexer.

Syntax Analysis:
Syntax analysis is the second phase of compilation process. It takes tokens as input and generates
a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the
tokens is syntactically correct or not. Syntax Analysis is also called Parsing.

CHAPTER-1 5
Semantic Analysis:
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. It also does type checking, Label checking, and Flow control
checking. Semantic analyzer keeps track of identifiers, their types and expressions. It performs
operations like matching of parenthesis in the expression, matching of if..else statement,
Performing arithmetic operation that are type compatible, etc. The output of semantic analysis
phase is the annotated tree syntax.

Intermediate Code Generation:


In the intermediate code generation, compiler generates the source code into the intermediate code.
Intermediate code is generated between the high-level language and the machine language. The
intermediate code should be generated in such a way that you can easily translate it into the target
machine code.

Code Optimization:
Code optimization is an optional phase. It is used to improve the intermediate code so that the
output of the program could run faster and take less space. It removes the unnecessary lines of the
code and arranges the sequence of statements in order to speed up the program execution.

Code Generation:
Code generation is the final stage of the compilation process. It takes the optimized intermediate
code as input and maps it to the target machine language. Code generator translates the
intermediate code into the machine code of the specified computer.

 Symbol Table:
Symbol Table is an important data structure created and maintained by the compiler in order
to keep track of semantics of variables i.e. it stores information about the scope and binding
information about names, information about instances of various entities such as variable and
function names, classes, objects, etc.The symbol table makes it easier for the compiler to
quickly search the identifier record and retrieve it.
Items stored in Symbol table:
 Variable names and constants,
 Procedure and function names,
 Literal constants and strings,
 Compiler generated temporaries,
 Labels in source languages
Information used by the compiler from Symbol table:
 Data type and name,
 Declaring procedures,
 Offset in storage,
 If structure or record then a pointer to structure table,
 For parameters, whether parameter passing by value or by reference,
 Number and type of arguments passed to function,
 Base Address

CHAPTER-1 6
Example:

CHAPTER-1 7
Types of Compiler:
 One pass compiler: If we combine or group all the phases of compiler design in a single
module known as a single pass compiler. It is a type of compiler that compiles whole process
in one-pass. It immediately translates each part into its final machine code.

 Two/Multi pass compiler: A Two pass/multi-pass Compiler is a type of compiler that


processes the source code or abstract syntax tree of a program multiple times. In multipass
Compiler, we divide phases into two passes as:
First Pass: is refers as
(a). Front end
(b). Analytic part
(c). Platform independent
Second Pass: is refers as
(a). Back end
(b). Synthesis Part
(c). Platform Dependent
With a multi-pass Compiler, we can solve these 2 basic problems:
1. If we want to design a compiler for a different programming language for the same
machine. In this case for each programming language, there is a requirement to make the
Front end/first pass for each of them and only one Back end/second pass.
2. If we want to design a compiler for the same programming language for different
machines/systems. In this case, we make different Back end for different Machine/system
and make only one Front end for the same programming language.

 Incremental compiler: The compiler which compiles only the changed line from the source
code and update the object code. The Incremental Compiler is such a compilation scheme in
which only modified source text gets recompiled and merged with previously compiled code
to form a new target code. Thus incremental compiler avoid recompilation of the whole source
code on some modification. Rather only modified portion of source program gets compiled.

 Native code compiler: Native compiler are compilers that generates code for the same
Platform on which it runs. It converts high language into computer's native language. For
example, Turbo C or GCC compiler. if a compiler runs on a Windows machine and produces
executable code for Windows, then it is a native compiler.

 Cross compiler: A cross compiler is a compiler capable of creating executable code for a
platform other than the one on which the compiler is running. For example, a compiler that
runs on a PC but generates code that runs on an Android smartphone is a cross compiler.

CHAPTER-1 8
Q: Explain lexical analysis phase of a compiler and, for a statement given below,
write output of all phases (except of an optimization phase) of a complier. Assume
a, b and c of type float
a = a + b * c * 2;

CHAPTER-1 9

You might also like