Professional Documents
Culture Documents
What Is A Compiler
What Is A Compiler
What is a Compiler?
Features of Compilers
Types of Compiler
Tasks of Compiler
History of Compiler
Steps for Language processing systems
Compiler Construction Tools
Why use a Compiler?
Application of Compilers
Features of Compilers
Correctness
Speed of compilation
Preserve the correct the meaning of the code
The speed of the target code
Recognize legal and illegal program constructs
Sarfraz Ali Roll No # 229 BSCS 5th Semester (M)
Good error reporting/handling
Code debugging help
Types of Compiler
Following are the different types of Compiler:
S
ingle Pass Compiler
In single pass Compiler source code directly transforms into machine code. For
example, Pascal language.
The Two pass compiler method also simplifies the retargeting process. It also allows
multiple front ends.
Sarfraz Ali Roll No # 229 BSCS 5th Semester (M)
Multipass Compilers
Multipass Compilers
The multipass compiler processes the source code or syntax tree of a program several
times. It divided a large program into multiple small programs and process them. It
develops multiple intermediate codes. All of these multipass take the output of the
previous phase as an input. So it requires less memory. It is also known as ‘Wide
Compiler’.
Tasks of Compiler
Main tasks performed by the Compiler are:
Breaks up the up the source program into pieces and impose grammatical
structure on them
Allows you to construct the desired target program from the intermediate
representation and also create the symbol table
Compiles source code and detects errors in it
Manage storage of all variables and codes.
Support for separate compilation
Read, analyze the entire program, and translate to semantically equivalent
Translating the source code into object code depending upon the type of
machine
History of Compiler
Important Landmark of Compiler’s history are as follows:
The “compiler” word was first used in the early 1950s by Grace Murray
Hopper
Linker: The linker helps you to link and merge various object files to create an
executable file. All these files might have been compiled with separate
assemblers. The main task of a linker is to search for called modules in a
program and to find out the memory location where all modules are stored.
Loader: The loader is a part of the OS, which performs the tasks of loading
executable files into memory and run them. It also calculates the size of a
program which creates additional memory space.
These tools use specific language or algorithm for specifying and implementing the
component of the compiler. Following are the example of compiler construction tools.
Scanner generators: This tool takes regular expressions as input. For example
LEX for Unix Operating System.
Syntax-directed translation engines: These software tools offer an
intermediate code by using the parse tree. It has a goal of associating one or
more translations with each node of the parse tree.
Application of Compilers
Compiler design helps full implementation Of High-Level Programming
Languages
Support optimization for Computer Architecture Parallelism
Design of New Memory Hierarchies of Machines
Widely used for Translating Programs
Used with other Software Productivity Tools
Q:Compiler construction?
The process of constructing a compiler is called compiler
construction.
Why we construct compiler?
Without compiler our machine is not able to understand
source code written in high level language.
Example
How Pleasant Is The Weather?
See this Lexical Analysis example; Here, we can easily recognize that there are five
words How Pleasant, The, Weather, Is. This is very natural for us as we can recognize
the separators, blanks, and the punctuation symbol.
HowPl easantIs Th ewe ather?
Now, check this example, we can also read this. However, it will take some time
because separators are put in the Odd Places. It is not something which comes to you
immediately.
Basic Terminologies:
Lexical Analyzer Architecture: How tokens are recognized
Roles of the Lexical analyzer
Lexical Errors
Error Recovery in Lexical Analyzer
Lexical Analyzer vs. Parser
Sarfraz Ali Roll No # 229 BSCS 5th Semester (M)
Why separate Lexical and Parser?
Advantages of Lexical analysis
Disadvantage of Lexical analysis
Basic Terminologies
What’s a lexeme?
A lexeme is a sequence of characters that are included in the source program
according to the matching pattern of a token. It is nothing but an instance of a token.
What’s a token?
Tokens in compiler design are the sequence of characters which represents a unit of
information in the source program.
Top 10 Behavioral Interview Questions and Answers
What is Pattern?
A pattern is a description which is used by the token. In the case of a keyword which
uses as a token, the pattern is a sequence of characters.
Lexical analyzer scans the entire source code of the program. It identifies each token
one by one. Scanners are usually implemented to produce tokens only when requested
by a parser. Here is how recognition of tokens in compiler design works-
1. “Get next token” is a command which is sent from the parser to the lexical
analyzer.
2. On receiving this command, the lexical analyzer scans the input until it finds
the next token.
3. It returns the token to Parser.
Lexical Analyzer skips whitespaces and comments while creating these tokens. If any
error is present, then Lexical analyzer will correlate that error with the source file and
line number.
Lexical Errors
A character sequence which is not possible to scan into any valid token is a lexical
error. Important facts about the lexical error:
Lexical errors are not very common, but it should be managed by a scanner
Misspelling of identifiers, operators, keyword are considered as lexical errors
Generally, a lexical error is caused by the appearance of some illegal character,
mostly at the beginning of a token.
Summary
Lexical analysis is the very first phase in the compiler designing
Lexemes and Tokens are the sequence of characters that are included in the
source program according to the matching pattern of a token
Lexical analyzer is implemented to scan the entire source code of the program
Lexical analyzer helps to identify token into the symbol table
A character sequence which is not possible to scan into any valid token is a
lexical error
Removes one character from the remaining input is useful Error recovery
method
Lexical Analyser scan the input program while parser perform syntax analysis
It eases the process of lexical analysis and the syntax analysis by eliminating
unwanted tokens
Lexical analyzer is used by web browsers to format and display a web page
with the help of parsed data from JavsScript, HTML, CSS
The biggest drawback of using Lexical analyzer is that it needs additional
runtime overhead is required to generate the lexer tables and construct the
tokens
Lexeme: A lexeme is the lowest level syntactic unit of a language (e.g., total,
start).
Noise words – Noise words are optional which are inserted in a statement to
enhance the readability of the sentence.
Top-Down Parsing,
Bottom-Up Parsing
Top-Down Parsing:
In the top-down parsing construction of the parse tree starts at the root and then
proceeds towards the leaves.
1. Predictive Parsing:
Predictive parse can predict which production should be used to replace the specific
input string. The predictive parser uses look-ahead point, which points towards next
input symbols. Backtracking is not an issue with this parsing technique. It is known as
LL(1) Parser
This parsing technique recursively parses the input to make a prase tree. It consists of
several small functions, one for each nonterminal in the grammar.
Bottom-Up Parsing:
In the bottom up parsing in compiler design, the construction of the parse tree starts
with the leave, and then it processes towards its root. It is also called as shift-reduce
parsing. This type of parsing in compiler design is created with the help of using
some software tools.
Panic-Mode recovery
In the case when the parser encounters an error, this mode ignores the rest of
the statement and not process input from erroneous input to delimiter, like a
semi-colon. This is a simple error recovery method.
In this type of recovery method, the parser rejects input symbols one by one
until a single designated group of synchronizing tokens is found. The
synchronizing tokens generally using delimiters like or.
Phrase-Level Recovery:
Compiler corrects the program by inserting or deleting tokens. This allows it to
proceed to parse from where it was. It performs correction on the remaining
input. It can replace a prefix of the remaining input with some string this helps
the parser to continue the process.
Error Productions
Error production recovery expands the grammar for the language which
generates the erroneous constructs. The parser then performs error diagnostic
about that construct.
Global Correction:
The compiler should make less number of changes as possible while processing
an incorrect input string. Given incorrect input string a and grammar c,
algorithms will search for a parse tree for a related string b. Like some
Sarfraz Ali Roll No # 229 BSCS 5th Semester (M)
insertions, deletions, and modification made of tokens needed to transform an
into b is as little as possible.
Grammar:
A grammar is a set of structural rules which describe a language. Grammars assign
structure to any sentence. This term also refers to the study of these rules, and this file
includes morphology, phonology, and syntax. It is capable of describing many, of the
syntax of programming languages.
The non-terminal symbol should appear to the left of the at least one production
The goal symbol should never be displayed to the right of the::= of any
production
A rule is recursive if LHS appears in its RHS
Notational Conventions
Notational conventions symbol may be indicated by enclosing the element in square
brackets. It is an arbitrary sequence of instances of the element which can be indicated
by enclosing the element in braces followed by an asterisk symbol, { … }*.
It is a choice of the alternative which may use the symbol within the single rule. It
may be enclosed by parenthesis ([,] ) when needed.
1.Terminals:
2.Nonterminals:
Grammar Derivation
Grammar derivation is a sequence of grammar rule which transforms the start symbol
into the string. A derivation proves that the string belongs to the grammar’s language.
Left-most Derivation
When the sentential form of input is scanned and replaced in left to right sequence, it
is known as left-most derivation. The sentential form which is derived by the left-most
derivation is called the left-sentential form.
Right-most Derivation
Rightmost derivation scan and replace the input with production rules, from right to
left, sequence. It’s known as right-most derivation. The sentential form which is
derived from the rightmost derivation is known as right-sentential form.
Summary
Syntax analysis is a second phase of the compiler design process that comes
after lexical analysis
The syntactical analyser helps you to apply rules to the code
Sentence, Lexeme, Token, Keywords and reserved words, Noise words,
Comments, Delimiters, Character set, Identifiers are some important terms used
in the Syntax Analysis in Compiler construction
Parse checks that the input string is well-formed, and if not, reject it
Parsing techniques are divided into two different groups: Top-Down Parsing,
Bottom-Up Parsing
Lexical, Syntactical, Semantical, and logical are some common errors occurs
during parsing method
A grammar is a set of structural rules which describe a language
Notational conventions symbol may be indicated by enclosing the element in
square brackets
A CFG is a left-recursive grammar that has at least one production of the type
Grammar derivation is a sequence of grammar rule which transforms the start
symbol into the string
The syntax analyser mainly deals with recursive constructs of the language
while the lexical analyser eases the task of the syntax analyser in DBMS
The drawback of Syntax analyser method is that it will never determine if a
token is valid or not
A compiler should comply with the syntax rule of that programming language in
which it is written. However, the compiler is only a program and cannot fix errors
found in that program. So, if you make a mistake, you need to make changes in the
syntax of your program. Otherwise, it will not compile.
What is Interpreter?
An interpreter is a computer program, which coverts each high-level program
statement into the machine code. This includes source code, pre-compiled code, and
scripts. Both compiler and interpreters do the same job which is converting higher
level programming language to machine code. However, a compiler will convert the
code into machine code (create an exe) before program run. Interpreters convert code
into machine code when the program is run.
KEY DIFFERENCE
Compiler transforms code written in a high-level programming language into
the machine code, at once, before program runs, whereas an Interpreter coverts
Sarfraz Ali Roll No # 229 BSCS 5th Semester (M)
each high-level program statement, one by one, into the machine code, during
program run.
Compiled code runs faster while interpreted code runs slower.
Compiler displays all errors after compilation, on the other hand, the Interpreter
displays errors of each line one by one.
Compiler is based on translation linking-loading model, whereas Interpreter is
based on Interpretation Method.
Compiler takes an entire program whereas the Interpreter takes a single line of
code.
Top 10 Behavioral Interview Questions and Answers
Role of Compiler
Compliers reads the source code, outputs executable code
Translates software written in a higher-level language into instructions that
computer can understand. It converts the text that a programmer writes into a
format the CPU can understand.
The process of compilation is relatively complicated. It spends a lot of time
analyzing and processing the program.
The executable result is some form of machine-specific binary code.
Role of Interpreter
The interpreter converts the source code line-by-line during RUN Time.
Interpret completely translates a program written in a high-level language into
machine level language.
Interpreter allows evaluation and modification of the program while it is
executing.
Relatively less time spent for analyzing and processing the program
Program execution is relatively slow compared to compiler
HIGH-LEVEL LANGUAGES
High-level languages, like C, C++, JAVA, etc., are very near to English. It makes
programming process easy. However, it must be translated into machine language
before execution. This translation process is either conducted by either a compiler or
an interpreter. Also known as source code.
MACHINE CODE
OBJECT CODE
On compilation of source code, the machine code generated for different processors
like Intel, AMD, and ARM is different. To make code portable, the source code is
first converted to Object Code. It is an intermediary code (similar to machine code)
that no processor will understand. At run time, the object code is converted to the
machine code of the underlying platform.