Download as pdf or txt
Download as pdf or txt
You are on page 1of 121

Compiler Design

VI SEMESTER
ETCS-302
Department of Computer Science and Engineering, BVCOE New Delhi

1
UNIT-I
Compiler Design

Introduction to Compiler Design


Language processing systems (using Compiler)
Cousins of Compiler
Phases of a Compiler
Lexical Analysis Phase
Bootstrapping
Syntax Analysis
Top Down Parsers
Bottom up parser

2 Department of Computer Science and Engineering, BVCOE New Delhi


Introduction of Compiler Design
 Compiler is a software which converts a program written in high level
language (Source Language) to low level language (Object/Target/Machine
Language).

 Cross Compiler that runs on a machine ‘A’ and produces a code for another
machine ‘B’. It is capable of creating code for a platform other than the one on
which the compiler is running.
 Source-to-source Compiler or transcompiler or transpiler is a compiler
that translates source code written in one programming language into source
code of another programming language.

3 Department of Computer Science and Engineering, BVCOE New Delhi


Language processing systems (using Compiler)
 We write programs in high-level language, which is easier for
us to understand and remember. These programs are then fed
into a series of tools and OS components to get the desired
code that can be used by the machine. This is known as
Language Proc
 The high-level language is converted into binary language in
various phases. A compiler is a program that converts high-
level language to assembly language. Similarly,
an assembler is a program that converts the assembly
language to machine-level language processing System.

4 Department of Computer Science and Engineering, BVCOE New Delhi


Language processing systems (using
Compiler)

5 Department of Computer Science and Engineering, BVCOE New Delhi


How a program executed on a host machine?
 User writes a program in C language (high-level language).
 The C compiler, compiles the program and translates it to
assembly program (low-level language).
 An assembler then translates the assembly program into
machine code (object).
 A linker tool is used to link all the parts of the program
together for execution (executable machine code).
 A loader loads all of them into memory and then the
program is executed.

6 Department of Computer Science and Engineering, BVCOE New Delhi


Cousins of Compiler
 Preprocessor
 A preprocessor, generally considered as a part of compiler, is a tool that produces input
for compilers. It deals with macro-processing, augmentation, file inclusion, language
extension, etc.
 Interpreter
 An interpreter, like a compiler, translates high-level language into low-level machine
language. The difference lies in the way they read the source code or input. A compiler
reads the whole source code at once, creates tokens, checks semantics, generates
intermediate code, executes the whole program and may involve many passes. In
contrast, an interpreter reads a statement from the input, converts it to an intermediate
code, executes it, then takes the next statement in sequence. If an error occurs, an
interpreter stops execution and reports it. whereas a compiler reads the whole program
even if it encounters several errors.
 Assembler
 An assembler translates assembly language programs into machine code.The
output of an assembler is called an object file, which contains a combination of
machine instructions as well as the data required to place these instructions in
memory.

7 Department of Computer Science and Engineering, BVCOE New Delhi


Cousins of Compiler
 Linker
 Linker is a computer program that links and merges various object files
together in order to make an executable file. All these files might have been
compiled by separate assemblers. The major task of a linker is to search and
locate referenced module/routines in a program and to determine the memory
location where these codes will be loaded, making the program instruction to
have absolute references.
 Loader
 Loader is a part of operating system and is responsible for loading executable
files into memory and execute them. It calculates the size of a program
(instructions and data) and creates memory space for it. It initializes various
registers to initiate execution.

8 Department of Computer Science and Engineering, BVCOE New Delhi


Phases of a Compiler
 There are two major phases of compilation, which in turn have many parts. Each of them
take input from the output of the previous level and work in a coordinated way.
 Analysis Phase – An intermediate representation is created from the give source code :
 Lexical Analyzer
 Syntax Analyzer
 Semantic Analyzer
 Intermediate Code Generator
 Lexical analyzer divides the program into “tokens”, Syntax analyzer
recognizes “sentences” in the program using syntax of language and
Semantic analyzer checks static semantics of each construct. Intermediate
Code Generator generates “abstract” code.
 Synthesis Phase – Equivalent target program is created from the intermediate
representation. It has two parts :
 Code Optimizer
 Code Generator
 Code Optimizer optimizes the abstract code, and final Code Generator translates
abstract intermediate code into specific machine instructions.

9 Department of Computer Science and Engineering, BVCOE New Delhi


Phases of a Compiler

10 Department of Computer Science and Engineering, BVCOE New Delhi


Phases of a Compiler
 Lexical Analysis
 The first phase of scanner works as a text scanner. This phase scans
the source code as a stream of characters and converts it into
meaningful lexemes. Lexical analyzer represents these lexemes in
the form of tokens as:
 <token-name, attribute-value>
 Syntax Analysis
 The next phase is called the syntax analysis or parsing. It takes
the token produced by lexical analysis as input and generates a
parse tree (or syntax tree). In this phase, token arrangements are
checked against the source code grammar, i.e. the parser checks if
the expression made by the tokens is syntactically correct.

11 Department of Computer Science and Engineering, BVCOE New Delhi


Phases of a Compiler
 Semantic Analysis
 Semantic analysis checks whether the parse tree constructed follows the
rules of language. For example, assignment of values is between
compatible data types, and adding string to an integer. Also, the semantic
analyzer keeps track of identifiers, their types and expressions; whether
identifiers are declared before use or not etc. The semantic analyzer
produces an annotated syntax tree as an output.
 Intermediate Code Generation
 After semantic analysis the compiler generates an intermediate code of
the source code for the target machine. It represents a program for
some abstract machine. It is in between the high-level language and the
machine language. This intermediate code should be generated in such a
way that it makes it easier to be translated into the target machine code.

12 Department of Computer Science and Engineering, BVCOE New Delhi


Phases of a Compiler
 Code Optimization
 The next phase does code optimization of the intermediate code. Optimization
can be assumed as something that removes unnecessary code lines, and arranges
the sequence of statements in order to speed up the program execution without
wasting resources (CPU, memory).
 Code Generation
 In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code
generator translates the intermediate code into a sequence of (generally) re-
locatable machine code. Sequence of instructions of machine code performs the
task as the intermediate code would do.
 Symbol Table
 It is a data-structure maintained throughout all the phases of a compiler. All the
identifier's names along with their types are stored here. The symbol table
makes it easier for the compiler to quickly search the identifier record and
retrieve it. The symbol table is also used for scope management.

13 Department of Computer Science and Engineering, BVCOE New Delhi


Lexical Analysis
 Lexical analysis is the first phase of a compiler. It takes the modified
source code from language preprocessors that are written in the form of
sentences. The lexical analyzer breaks these syntaxes into a series of
tokens, by removing any whitespace or comments in the source code.

 If the lexical analyzer finds a token invalid, it generates an error. The


lexical analyzer works closely with the syntax analyzer. It reads character
streams from the source code, checks for legal tokens, and passes the
data to the syntax analyzer when it demands.

14 Department of Computer Science and Engineering, BVCOE New Delhi


What is a token?
 A lexical token is a sequence of characters that can be treated as a
unit in the grammar of the programming languages.
 Example of tokens:
 Type token (id, number, real, . . . )
 Punctuation tokens (IF, void, return, . . . )
 Alphabetic tokens (keywords)
 Keywords; Examples-for, while, if etc. Identifier; Examples-
Variable name, function name etc. Operators; Examples '+', '++',
'-' etc. Separators; Examples ',' ';' etc
 Example of Non-Tokens:
 Comments, preprocessor directive, macros, blanks, tabs,
newline etc
 unit in the grammar of the programming languages.
Department of Computer Science and Engineering, BVCOE New Delhi
15
Lexeme
 The sequence of characters matched by a pattern to form
the corresponding token or a sequence of input characters
that comprises a single token is called a lexeme. eg- “float”,
“abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
 Lexemes are said to be a sequence of characters
(alphanumeric) in a token. There are some predefined rules
for every lexeme to be identified as a valid token. These rules
are defined by grammar rules, by means of a pattern. A
pattern explains what can be a token, and these patterns are
defined by means of regular expressions.

Department of Computer Science and Engineering, BVCOE New Delhi


16
Functions of Lexical Analyzer
1. Tokenization .i.e Dividing the program into valid tokens
2. Remove white space characters.
3. Remove comments.
4. It also provides help in generating error message by
providing row number and column number.

Department of Computer Science and Engineering, BVCOE New Delhi


17
Token example
 In C language, the variable declaration line
 int value = 100;contains the tokens:
 int (keyword), value (identifier), = (operator), 100
(constant) and ; (symbol).
 No of tokens=5

 a=b+c; It will generate token sequence like this:


 id=id+id; Where each id reference to it’s variable in the
symbol table referencing all details
 No of tokens=6
Department of Computer Science and Engineering, BVCOE New Delhi
18
Exercise 1
 Count no. of tokens:
int main()
{
// 2 variables
int a, b;
a = 10;
return 0;
}

Department of Computer Science and Engineering, BVCOE New Delhi


19
Solution
 All the valid tokens are:
 ‘int’ 'main‘ '(' ')' '{‘ 'int‘ 'a‘ ',' 'b' ';' 'a' '=' '10'
';' 'return‘ '0‘ ';' '}‘
 No of tokens=18

Department of Computer Science and Engineering, BVCOE New Delhi


20
Exercise 2
 Count number of tokens :
int main()
{
int a = 10, b = 20;
printf("sum is :%d",a+b);
return 0;
}

Department of Computer Science and Engineering, BVCOE New Delhi


21
Solution
 All the valid tokens are:
‘int’ 'main‘ '(' ')' '{‘ 'int‘ 'a‘ ‘=‘ ‘10’ ‘,’ ‘b’ ‘=‘ ’20’ ‘;’
‘printf’ ‘(‘ ‘ "sum is :%d“ ’ ‘,’ ‘a’ ‘+’ ‘b’ ‘)’ ‘;’
‘return’ ‘ 0’ ‘; ‘
‘}’
No of tokens=27

Department of Computer Science and Engineering, BVCOE New Delhi


22
Exercise 3
 Count number of tokens :
 int max(int i);

Answer: Total number of tokens 7:


int, max, ( , int , i, ), ;

Department of Computer Science and Engineering, BVCOE New Delhi


23
Longest Match Rule
 When the lexical analyzer read the source-code, it scans the code
letter by letter; and when it encounters a whitespace, operator
symbol, or special symbols, it decides that a word is completed.
 For example: int intvalue;
 While scanning both lexemes till ‘int’, the lexical analyzer cannot
determine whether it is a keyword int or the initials of identifier
int value.
 The Longest Match Rule states that the lexeme scanned should be
determined based on the longest match among all the tokens
available.
 The lexical analyzer also follows rule priority where a reserved
word, e.g., a keyword, of a language is given priority over user
input. That is, if the lexical analyzer finds a lexeme that matches
with any existing reserved word, it should generate an error.
Department of Computer Science and Engineering, BVCOE New Delhi
24
Bootstrapping
 Bootstrapping is a process in which simple language is used to
translate more complicated program which in turn may handle for more
complicated program. This complicated program can further handle
even more complicated program and so on.
 Writing a compiler for any high level language is a complicated process.
It takes lot of time to write a compiler from scratch. Hence simple
language is used to generate target code in some stages. to clearly
understand the Bootstrapping technique consider a following
scenario.
 Suppose we want to write a cross compiler for new language X. The
implementation language of this compiler is say Y and the target code
being generated is in language Z. That is, we create XYZ. Now if existing
compiler Y runs on machine M and generates code for M then it is
denoted as YMM. Now if we run XYZ using YMM then we get a
compiler XMZ. That means a compiler for source language X that
generates a target code in language Z and which runs on machine M.
Department of Computer Science and Engineering, BVCOE New Delhi
25
Example
 We can create compiler of many different forms. Now we
will generate.

Department of Computer Science and Engineering, BVCOE New Delhi


26
How bootstrapping is done:
 Compiler which takes C language and generates an assembly language
as an output with the availability of a machine of assembly language.
 Step-1: First we write a compiler for a small of C in assembly
language.

27 Department of Computer Science and Engineering, BVCOE New Delhi


Continue…
 Step-2: Then using with small subset of C i.e. C0, for the
source language c the compiler is written.

Department of Computer Science and Engineering, BVCOE New Delhi


28
Continue…
 Step-3: Finally we compile the second compiler. using compiler 1
the compiler 2 is compiled.

 Step-4: Thus we get a compiler written in ASM which compiles C


and generates code in ASM.

Department of Computer Science and Engineering, BVCOE New Delhi


29
Syntax Analysis
 Syntax Analysis or Parsing is the second phase, i.e. after
lexical analysis. It checks the syntactical structure of the
given input, i.e. whether the given input is in the correct
syntax (of the language in which the input has been written)
or not. It does so by building a data structure, called a Parse
tree or Syntax tree. The parse tree is constructed by using the
pre-defined Grammar of the language and the input string. If
the given input string can be produced with the help of the
syntax tree (in the derivation process), the input string is
found to be in the correct syntax. if not, error is reported by
syntax analyzer.
Department of Computer Science and Engineering, BVCOE New Delhi
30
Syntax Analysis
 The Grammar for a Language consists of Production rules.
 Example:
Suppose Production rules for the Grammar of a language are:
 S -> cAd
 A -> bc|a
 And the input string is “cad”.

31 Department of Computer Science and Engineering, BVCOE New Delhi


Parse Tree
 A parse tree is a graphical depiction of a derivation. It is
convenient to see how strings are derived from the start symbol.
The start symbol of the derivation becomes the root of the parse
tree. Let us see this by an example from the last topic.
 We take the left-most derivation of a + b * c
 The left-most derivation is:
 E→E*E
 E→E+E*E
 E → id + E * E
 E → id + id * E
 E → id + id * id

Department of Computer Science and Engineering, BVCOE New Delhi


32
Step of creating Parse Tree
 Step 1:
 E→E*E

 Step 2:
 E→E+E*E

 Step 3:
 E → id + E * E

33 Department of Computer Science and Engineering, BVCOE New Delhi


Continue…
 Step 4:
 E → id + id * E

 Step 5:
 E → id + id * id

Department of Computer Science and Engineering, BVCOE New Delhi


34
Ambiguity
 A grammar G is said to be ambiguous if it has more than one parse tree (left or right
derivation) for at least one string.
 Example
E→E+E
E→E–E
E → id
 For the string id + id – id, the above grammar generates two parse trees:

 .

Department of Computer Science and Engineering, BVCOE New Delhi


35
Ambiguity
 The language generated by an ambiguous grammar is said to
be inherently ambiguous. Ambiguity in grammar is not
good for a compiler construction. No method can detect and
remove ambiguity automatically, but it can be removed by
either re-writing the whole grammar without ambiguity, or
by setting and following associativity and precedence
constraints.

Department of Computer Science and Engineering, BVCOE New Delhi


36
Associativity
 If an operand has operators on both sides, the side on which the
operator takes this operand is decided by the associativity of those
operators. If the operation is left-associative, then the operand will
be taken by the left operator or if the operation is right-
associative, the right operator will take the operand.
 Example
 Operations such as Addition, Multiplication, Subtraction, and
Division are left associative. If the expression contains:
 id op id op idit will be evaluated as:
 (id op id) op idFor example, (id + id) + id
 Operations like Exponentiation are right associative, i.e., the
order of evaluation in the same expression will be:
 id op (id op id)For example, id ^ (id ^ id)

Department of Computer Science and Engineering, BVCOE New Delhi


37
Precedence
 If two different operators share a common operand, the
precedence of operators decides which will take the operand.
That is, 2+3*4 can have two different parse trees, one
corresponding to (2+3)*4 and another corresponding to
2+(3*4). By setting precedence among operators, this
problem can be easily removed. As in the previous example,
mathematically * (multiplication) has precedence over +
(addition), so the expression 2+3*4 will always be
interpreted as:
 2 + (3 * 4)These methods decrease the chances of ambiguity
in a language or its grammar.
Department of Computer Science and Engineering, BVCOE New Delhi
38
Left Recursion
 A grammar becomes left-recursive if it has any non-terminal ‘A’ whose
derivation contains ‘A’ itself as the left-most symbol. Left-recursive grammar is
considered to be a problematic situation for top-down parsers. Top-down
parsers start parsing from the Start symbol, which in itself is non-terminal. So,
when the parser encounters the same non-terminal in its derivation, it becomes
hard for it to judge when to stop parsing the left non-terminal and it goes into
an infinite loop.
 Example:
 (1) A => Aα | β

 .

Department of Computer Science and Engineering, BVCOE New Delhi


39
Left Recursion
(2) Example
S => Aα | β
A => Sd
 (2) is an example of indirect-left recursion.

Department of Computer Science and Engineering, BVCOE New Delhi


40
Removal of Left Recursion
 One way to remove left recursion is to use the following
technique:
 The production
 A => Aα | β is converted into following productions
 A => βA'
 A'=> αA' | ε
 This does not impact the strings derived from the grammar,
but it removes immediate left recursion.

41 Department of Computer Science and Engineering, BVCOE New Delhi


Removal of Indirect-Left Recursion
 Second method is to use the following algorithm, which should eliminate all direct and
indirect left recursions.
 START
 Arrange non-terminals in some order like A1, A2, A3,…, An for each i from 1 to n
{
for each j from 1 to i-1
{
replace each production of form Ai Aj
with Ai δ1 | δ2 | δ3 |…|
where Aj δ1 | δ2|…| δn are current Aj productions
}
}
eliminate immediate left-recursion
END

42 Department of Computer Science and Engineering, BVCOE New Delhi


Example
• The production set
 S => Aα | β
 A => Sd
 after applying the above algorithm, should become
 S => Aα | β
 A => Aαd | βd and then, remove immediate left recursion
using the first technique.
 A => βdA' A' => αdA' | ε
 Now none of the production has either direct or indirect left
recursion.

Department of Computer Science and Engineering, BVCOE New Delhi


43
Left Factoring
 If more than one grammar production rules has a common
prefix string, then the top-down parser cannot make a choice
as to which of the production it should take to parse the
string in hand.
 Example
 If a top-down parser encounters a production like
 A αβ | α | … Then it cannot determine which
production to follow to parse the string as both productions
are starting from the same terminal (or non-terminal). To
remove this confusion, we use a technique called left
factoring.

Department of Computer Science and Engineering, BVCOE New Delhi


44
Removal of Left Factoring
 Left factoring transforms the grammar to make it useful for
top-down parsers. In this technique, we make one
production for each common prefixes and the rest of the
derivation is added by new productions.
 Example
 The above productions can be written as
 A => αA' A'=> β | | … Now the parser has only one
production per prefix which makes it easier to take decisions.

Department of Computer Science and Engineering, BVCOE New Delhi


45
Top Down Parsers
 Parsing is classified into two categories, i.e. Top Down
Parsing and Bottom-Up Parsing. Top-Down Parsing is based
on Left Most Derivation whereas Bottom Up Parsing is
dependent on Reverse Right Most Derivation.
 The process of constructing the parse tree which starts from
the root and goes down to the leaf is Top-Down Parsing.
 Top-Down Parsers constructs from the Grammar which is
free from ambiguity and left recursion.
 Top Down Parsers uses leftmost derivation to construct a
parse tree.
 It allows a grammar which is free from Left Factoring.

Department of Computer Science and Engineering, BVCOE New Delhi


46
Classification of Top-Down Parsing –
 With Backtracking: Brute Force Technique

 Without Backtracking:1. Recursive Descent Parsing


2. Predictive Parsing or Non-Recursive Parsing or LL(1)
Parsing or Table Driver Parsing

Department of Computer Science and Engineering, BVCOE New Delhi


47
Back-tracking
 Top- down parsers start from the root node (start symbol)
and match the input string against the production rules to
replace them (if matched). To understand this, take the
following example of CFG:

S → rXd | rZd
X → oa | ea
Z → ai

 For an input string: read, a top-down parser, will behave like


this:
Department of Computer Science and Engineering, BVCOE New Delhi
48
Back-tracking
 It will start with S from the production rules and will match its yield to the left-
most letter of the input, i.e. ‘r’. The very production of S (S → rXd) matches
with it. So the top-down parser advances to the next input letter (i.e. ‘e’). The
parser tries to expand non-terminal ‘X’ and checks its production from the left
(X → oa). It does not match with the next input symbol. So the top-down
parser backtracks to obtain the next production rule of X, (X → ea).

 Now the parser matches all the input letters in an ordered manner. The string is
accepted.

Department of Computer Science and Engineering, BVCOE New Delhi


49
Recursive Descent Parser
 It is a kind of Top-Down Parser. A top-down parser builds the parse tree from the top to
down, starting with the start non-terminal. A Predictive Parser is a special case of
Recursive Descent Parser, where no Back Tracking is required.
By carefully writing a grammar means eliminating left recursion and left factoring from
it, the resulting grammar will be a grammar that can be parsed by a recursive descent
parser.
Example:
 BEFORE REMOVING LEFT RECURSION AFTER REMOVING LEFT RECURSION
 E –> E + T | T E –> T E’
E’ –> + T E’ | e
T –> T * F | F T –> F T’
T’ –> * F T’ | e
F –> ( E ) | id F –> ( E ) | id

 **Here e is Epsilon

Department of Computer Science and Engineering, BVCOE New Delhi


50
Recursive Descent Parser
 For Recursive Descent Parser, we are going to write one
program for every variable.
 Example: Grammar:
 E --> i E'
 E' --> + i E' | e

Department of Computer Science and Engineering, BVCOE New Delhi


51
Recursive Descent Parser
 int main()
 {
 // E is a start symbol.
 E();

 // if lookahead = $, it represents the end of the string
 // Here l is lookahead.
 if (l == '$')
 printf("Parsing Successful");
 }

52 Department of Computer Science and Engineering, BVCOE New Delhi


Recursive Descent Parser
 // Definition of E, as per the given production
 E( )
 {
 if (l == 'i') {
 match('i');
 E'();
 }
 }

Department of Computer Science and Engineering, BVCOE New Delhi


53
Recursive Descent Parser
 // Definition of E' as per the given production

 E'( )
 {
 if (l == '+') {
 match('+');
 match('i');
 E'();
 }
 else
 return ();
 }

54 Department of Computer Science and Engineering, BVCOE New Delhi


Recursive Descent Parser
 // Match function
 match(char t)
 {
 if (l == t) {
 l = getchar();
 }
 else
 printf("Error");
 }

Department of Computer Science and Engineering, BVCOE New Delhi


55
LL(1) Parsing
 Here the 1st L represents that the scanning of the Input will be done from Left
to Right manner and second L shows that in this Parsing technique we are going
to use Left most Derivation Tree. and finally the 1 represents the number of
look ahead, means how many symbols are you going to see when you want to
make a decision.
 Construction of LL(1) Parsing Table:
To construct the Parsing table, we have two functions:
 1: First(): If there is a variable, and from that variable if we try to drive all the
strings then the beginning Terminal Symbol is called the first.
 2: Follow():What is the Terminal Symbol which follow a variable in the process
of derivation.
 Now, after computing the First and Follow set for each Non-Terminal symbol we
have to construct the Parsing table. In the table Rows will contain the Non-
Terminals and the column will contain the Terminal Symbols.
All the Null Productions of the Grammars will go under the Follow
elements and the remaining productions will lie under the elements of First set.

Department of Computer Science and Engineering, BVCOE New Delhi


56
Example-1:
 Consider the Grammar:
E --> TE'
E' --> +TE' | e
T --> FT’
T' --> *FT' | e
F --> id | (E)
 **e denotes epsilonFind their first and follow sets:

Department of Computer Science and Engineering, BVCOE New Delhi


57
First and Follow

Department of Computer Science and Engineering, BVCOE New Delhi


58
LL(1) Parsing Table

Department of Computer Science and Engineering, BVCOE New Delhi


59
Example-2
 Consider the Grammar
 S --> A | a
 A --> a
 Find their first and follow sets:

Department of Computer Science and Engineering, BVCOE New Delhi


60
Parsing Table

 Here, we can see that there are two productions into the same
cell. Hence, this grammar is not feasible for LL(1) Parser.

Department of Computer Science and Engineering, BVCOE New Delhi


61
Bottom up parser
 Bottom-up parsing starts from the leaf nodes of a tree and works in upward
direction till it reaches the root node. Here, we start from a sentence and then
apply production rules in reverse manner in order to reach the start symbol.

Department of Computer Science and Engineering, BVCOE New Delhi


62
Shift Reduce parser
 Shift Reduce parser attempts for the construction of
parse in a similar manner as done in bottom up parsing i.e.
the parse tree is constructed from leaves(bottom) to the
root(up). A more general form of shift reduce parser is LR
parser.
This parser requires some data structures i.e.
 A input buffer for storing the input string.
 A stack for storing and accessing the production rules.

Department of Computer Science and Engineering, BVCOE New Delhi


63
Basic Operations –
 Shift: This involves moving of symbols from input buffer onto the
stack.
 Reduce: If the handle appears on top of the stack then, its
reduction by using appropriate production rule is done i.e. RHS of
production rule is popped out of stack and LHS of production rule
is pushed onto the stack.
 Accept: If only start symbol is present in the stack and the input
buffer is empty then, the parsing action is called accept. When
accept action is obtained, it is means successful parsing is done.
 Error: This is the situation in which the parser can neither
perform shift action nor reduce action and not even accept action.

Department of Computer Science and Engineering, BVCOE New Delhi


64
Example1:
 Consider the grammar
S –> S + S
S –> S * S
S –> id
 Perform Shift Reduce parsing for input string “id + id + id”.

Department of Computer Science and Engineering, BVCOE New Delhi


65
Parsing table

Department of Computer Science and Engineering, BVCOE New Delhi


66
Example 2 –
 Consider the grammar
E –> 2E2
E –> 3E3
E –> 4
Perform Shift Reduce parsing for input string “32423”.

Department of Computer Science and Engineering, BVCOE New Delhi


67
Parsing Table

Department of Computer Science and Engineering, BVCOE New Delhi


68
Department of Computer Science and Engineering, BVCOE New Delhi
69
LR algorithm
 The LR algorithm requires stack, input, output and parsing table. In all type of
LR parsing, input, output and stack are same but parsing table is different.

Fig: Block diagram of LR parser

Department of Computer Science and Engineering, BVCOE New Delhi


70
LR algorithm
 Input buffer is used to indicate end of input and it
contains the string to be parsed followed by a $ Symbol.
 A stack is used to contain a sequence of grammar
symbols with a $ at the bottom of the stack
 Parsing table is a two dimensional array. It contains two
parts: Action part and Go To part.

Department of Computer Science and Engineering, BVCOE New Delhi


71
LR (1) Parsing
 Various steps involved in the LR (1) Parsing:
 For the given input string write a context free grammar.
 Check the ambiguity of the grammar.
 Add Augment production in the given grammar.
 Create Canonical collection of LR (0) items.
 Draw a data flow diagram (DFA).
 Construct a LR (1) parsing table.

Department of Computer Science and Engineering, BVCOE New Delhi


72
Augment Grammar
 Augmented grammar G` will be generated if we add one more
production in the given grammar G. It helps the parser to identify
when to stop the parsing and announce the acceptance of the
input.
 Example
 Given grammar
 S → AA
 A → aA | b
 The Augment grammar G` is represented by
 S`→ S
 S → AA
 A → aA | b

Department of Computer Science and Engineering, BVCOE New Delhi


73
Canonical Collection of LR(0) items
 An LR (0) item is a production G with dot at some position
on the right side of the production.
 LR(0) items is useful to indicate that how much of the input
has been scanned up to a given point in the process of
parsing.
 In the LR (0), we place the reduce node in the entire row.

Department of Computer Science and Engineering, BVCOE New Delhi


74
Question
 Given grammar:
 S → AA
 A → aA | b
 Solve using LR parser

Department of Computer Science and Engineering, BVCOE New Delhi


75
Solution:
 Add Augment Production and insert '•' symbol at the first
position for every production in G
 S` → •S
 S → •AA
 A → •aA
 A → •b

Department of Computer Science and Engineering, BVCOE New Delhi


76
Solution
 I0 State:
 Add Augment production to the I0 State and Compute the Closure
 I0 = Closure (S` → •S)
 Add all productions starting with S in to I0 State because "•" is
followed by the non-terminal. So, the I0 State becomes
 I0 = S` → •S
S → •AA
 Add all productions starting with "A" in modified I0 State because
"•" is followed by the non-terminal. So, the I0 State becomes.
 I0= S` → •S
S → •AA
A → •aA
A → •b

Department of Computer Science and Engineering, BVCOE New Delhi


77
Solution
 I1= Go to (I0, S) = closure (S` → S•) = S` → S•
 Here, the Production is reduced so close the State.
 I1= S` → S•
 I2= Go to (I0, A) = closure (S → A•A)
 Add all productions starting with A in to I2 State because "•"
is followed by the non-terminal. So, the I2 State becomes
 I2 =S→A•A
A → •aA
A → •b
 Go to (I2,a) = Closure (A → a•A) = (same as I3)
 Go to (I2, b) = Closure (A → b•) = (same as I4)

78 Department of Computer Science and Engineering, BVCOE New Delhi


Solution
 I3= Go to (I0,a) = Closure (A → a•A)
 Add productions starting with A in I3.
 A → a•A
A → •aA
A → •b
 Go to (I3, a) = Closure (A → a•A) = (same as I3)
Go to (I3, b) = Closure (A → b•) = (same as I4)
 I4= Go to (I0, b) = closure (A → b•) = A → b•
I5= Go to (I2, A) = Closure (S → AA•) = SA → A•
I6= Go to (I3, A) = Closure (A → aA•) = A → aA•

Department of Computer Science and Engineering, BVCOE New Delhi


79
Drawing DFA

Department of Computer Science and Engineering, BVCOE New Delhi


80
LR(0) Table
 If a state is going to some other state on a terminal then it
correspond to a shift move.
 If a state is going to some other state on a variable then it
correspond to go to move.
 If a state contain the final item in the particular row then
write the reduce node completely.

Department of Computer Science and Engineering, BVCOE New Delhi


81
Department of Computer Science and Engineering, BVCOE New Delhi
82
Explanation:
 I0 on S is going to I1 so write it as 1.
 I0 on A is going to I2 so write it as 2.
 I2 on A is going to I5 so write it as 5.
 I3 on A is going to I6 so write it as 6.
 I0, I2and I3on a are going to I3 so write it as S3 which means
that shift 3.
 I0, I2 and I3 on b are going to I4 so write it as S4 which
means that shift 4.
 I4, I5 and I6 all states contains the final item because they
contain • in the right most end. So rate the production as
production number.
Department of Computer Science and Engineering, BVCOE New Delhi
83
Productions are numbered as follows:
 S → AA ... (1)
 A → aA ... (2)
 A → b ... (3)
 I1 contains the final item which drives(S` → S•), so action {I1, $} =
Accept.
 I4 contains the final item which drives A → b• and that production
corresponds to the production number 3 so write it as r3 in the entire
row.
 I5 contains the final item which drives S → AA• and that production
corresponds to the production number 1 so write it as r1 in the entire
row.
 I6 contains the final item which drives A → aA• and that production
corresponds to the production number 2 so write it as r2 in the entire
row.

Department of Computer Science and Engineering, BVCOE New Delhi


84
SLR (1) Parsing
 SLR (1) refers to simple LR Parsing. It is same as LR(0)
parsing. The only difference is in the parsing table.To
construct SLR (1) parsing table, we use canonical collection
of LR (0) item.
 In the SLR (1) parsing, we place the reduce move only in the
follow of left hand side.

Department of Computer Science and Engineering, BVCOE New Delhi


85
SLR (1) Table Construction
 The steps which use to construct SLR (1) Table is given below:
 If a state (Ii) is going to some other state (Ij) on a terminal then it corresponds
to a shift move in the action part.

 IF a state (Ii) contains the final item like A → ab• which has no transitions to the
next state then the production is known as reduce production. For all terminals
X in FOLLOW (A), write the reduce entry along with their production
numbers.

Department of Computer Science and Engineering, BVCOE New Delhi


86
SLR (1) Table Construction
 If a state (Ii) is going to some other state (Ij) on a variable
then it correspond to go to move in the Go to part.

Department of Computer Science and Engineering, BVCOE New Delhi


87
Example
 S→E
E → E +T |T
T →T * F | F
F → id
 Is this grammar SLR grammar or not?

Department of Computer Science and Engineering, BVCOE New Delhi


88
Solution
 Add Augment Production and insert '•' symbol at the first
position for every production in G
 S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id

Department of Computer Science and Engineering, BVCOE New Delhi


89
Solution
 I0 State:
 Add Augment production to the I0 State and Compute the Closure
 I0 = Closure (S` → •E)
 Add all productions starting with E in to I0 State because "." is
followed by the non-terminal. So, the I0 State becomes
 I0 = S` → •E
E → •E + T
E → •T
 Add all productions starting with T and F in modified I0 State
because "." is followed by the non-terminal. So, the I0 State
becomes.
Department of Computer Science and Engineering, BVCOE New Delhi
90
Solution
 I0= S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
 I1= Go to (I0, E) = closure (S` → E•, E → E• + T)
I2= Go to (I0, T) = closure (E →T•T, T• → * F)
I3= Go to (I0, F) = Closure ( T → F• ) = T → F•
I4= Go to (I0, id) = closure ( F → id•) = F → id•
I5= Go to (I1, +) = Closure (E → E +•T)
 Add all productions starting with T and F in I5 State because "." is
followed by the non-terminal. So, the I5 State becomes
Department of Computer Science and Engineering, BVCOE New Delhi
91
Solution
 I5 = E → E +•T
T → •T * F
T → •F
F → •id
 Go to (I5, F) = Closure (T → F•) = (same as I3)
Go to (I5, id) = Closure (F → id•) = (same as I4)
 I6= Go to (I2, *) = Closure (T → T * •F)
 Add all productions starting with F in I6 State because "." is
followed by the non-terminal. So, the I6 State becomes

Department of Computer Science and Engineering, BVCOE New Delhi


92
Solution
 I6 = T → T * •F
F → •id
 Go to (I6, id) = Closure (F → id•) = (same as I4)
 I7= Go to (I5, T) = Closure (E → E + T•) = E → E + T•
I8= Go to (I6, F) = Closure (T → T * F•) = T → T * F•

Department of Computer Science and Engineering, BVCOE New Delhi


93
Drawing DFA:

Department of Computer Science and Engineering, BVCOE New Delhi


94
SLR (1) Table

Department of Computer Science and Engineering, BVCOE New Delhi


95
Explanation
 First (E) = First (E + T) First (T)
First (T) = First (T * F) First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
Follow (E) = First (+T) {$} = {+, $}
Follow (T) = First (*F) First (F)
= {*, +, $}
Follow (F) = {*, +, $}

Department of Computer Science and Engineering, BVCOE New Delhi


96
Explanation
 I1 contains the final item which drives S → E• and follow (S) = {$}, so
action {I1, $} = Accept
 I2 contains the final item which drives E → T• and follow (E) = {+, $},
so action {I2, +} = R2, action {I2, $} = R2
 I3 contains the final item which drives T → F• and follow (T) = {+, *,
$}, so action {I3, +} = R4, action {I3, *} = R4, action {I3, $} = R4
 I4 contains the final item which drives F → id• and follow (F) = {+, *,
$}, so action {I4, +} = R5, action {I4, *} = R5, action {I4, $} = R5
 I7 contains the final item which drives E → E + T• and follow (E) =
{+, $}, so action {I7, +} = R1, action {I7, $} = R1
 I8 contains the final item which drives T → T * F• and follow (T) = {+,
*, $}, so action {I8, +} = R3, action {I8, *} = R3, action {I8, $} =
R3.

Department of Computer Science and Engineering, BVCOE New Delhi


97
CLR (1) Parsing
 CLR refers to canonical lookahead. CLR parsing use the
canonical collection of LR (1) items to build the CLR (1)
parsing table. CLR (1) parsing table produces the more
number of states as compare to the SLR (1) parsing.
 In the CLR (1), we place the reduce node only in the
lookahead symbols.

Department of Computer Science and Engineering, BVCOE New Delhi


98
Steps involved in the CLR (1) Parsing
 For the given input string write a context free grammar
 Check the ambiguity of the grammar
 Add Augment production in the given grammar
 Create Canonical collection of LR (0) items
 Draw a data flow diagram (DFA)
 Construct a CLR (1) parsing table

Department of Computer Science and Engineering, BVCOE New Delhi


99
CLR (1) Parsing
 LR (1) item
 LR (1) item is a collection of LR (0) items and a look ahead
symbol.
 LR (1) item = LR (0) item + look ahead
 The look ahead is used to determine that where we place the
final item.
 The look ahead always add $ symbol for the argument
production.

Department of Computer Science and Engineering, BVCOE New Delhi


100
Example:
 S → AA
A → aA
A→b
 Is this grammar CLR or not ?

Department of Computer Science and Engineering, BVCOE New Delhi


101
Solution
 Add Augment Production, insert '•' symbol at the first
position for every production in G and also add the
lookahead.
 S` → •S, $
 S → •AA, $
 A → •aA, a/b
 A → •b, a/b

102 Department of Computer Science and Engineering, BVCOE New Delhi


Solution
 I0 State:
 Add Augment production to the I0 State and Compute the
Closure
 I0 = Closure (S` → •S)
 Add all productions starting with S in to I0 State because "."
is followed by the non-terminal. So, the I0 State becomes
 I0 = S` → •S, $
S → •AA, $
 Add all productions starting with A in modified I0 State
because "." is followed by the non-terminal. So, the I0 State
becomes.

103 Department of Computer Science and Engineering, BVCOE New Delhi


Solution
 I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
 I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
 Add all productions starting with A in I2 State because "." is
followed by the non-terminal. So, the I2 State becomes
 I2= S → A•A, $
A → •aA, $
A → •b, $

Department of Computer Science and Engineering, BVCOE New Delhi


104
Solution
 I3= Go to (I0, a) = Closure ( A → a•A, a/b )
 Add all productions starting with A in I3 State because "." is
followed by the non-terminal. So, the I3 State becomes
 I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
 Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
 I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
 Add all productions starting with A in I6 State because "." is
followed by the non-terminal. So, the I6 State becomes

Department of Computer Science and Engineering, BVCOE New Delhi


105
Solution
 I6 = A → a•A, $
A → •aA, $
A → •b, $
 Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
 I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•,
a/b
I9= Go to (I6, A) = Closure (A → aA•, $) = A → aA•, $

Department of Computer Science and Engineering, BVCOE New Delhi


106
Drawing DFA

Department of Computer Science and Engineering, BVCOE New Delhi


107
CLR (1) Parsing table:

Department of Computer Science and Engineering, BVCOE New Delhi


108
Solution
 Productions are numbered as follows:
 S → AA ... (1)
 A → aA ....(2)
 A → b ... (3)
 The placement of shift node in CLR (1) parsing table is same
as the SLR (1) parsing table. Only difference in the
placement of reduce node.

Department of Computer Science and Engineering, BVCOE New Delhi


109
Solution
 I4 contains the final item which drives ( A → b•, a/b), so
action {I4, a} = R3, action {I4, b} = R3.
I5 contains the final item which drives ( S → AA•, $), so
action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$), so action
{I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b), so
action {I8, a} = R2, action {I8, b} = R2.
I9 contains the final item which drives ( A → aA•, $), so
action {I9, $} = R2.

Department of Computer Science and Engineering, BVCOE New Delhi


110
LALR (1) Parsing
 LALR refers to the lookahead LR. To construct the LALR (1)
parsing table, we use the canonical collection of LR (1)
items.
 In the LALR (1) parsing, the LR (1) items which have same
productions but different look ahead are combined to form a
single set of items
 LALR (1) parsing is same as the CLR (1) parsing, only
difference in the parsing table.

Department of Computer Science and Engineering, BVCOE New Delhi


111
Example
 S → AA
A → aA
A→b
Is this grammar LALR grammar ?

Department of Computer Science and Engineering, BVCOE New Delhi


112
Solution
 Add Augment Production, insert '•' symbol at the first
position for every production in G and also add the look
ahead.
 S` → •S, $
 S → •AA, $
 A → •aA, a/b
 A → •b, a/b

Department of Computer Science and Engineering, BVCOE New Delhi


113
Solution
 I0 State:
 Add Augment production to the I0 State and Compute the
ClosureL
 I0 = Closure (S` → •S)
 Add all productions starting with S in to I0 State because "•"
is followed by the non-terminal. So, the I0 State becomes
 I0 = S` → •S, $
S → •AA, $
 Add all productions starting with A in modified I0 State
because "•" is followed by the non-terminal. So, the I0 State
becomes.

114 Department of Computer Science and Engineering, BVCOE New Delhi


Solution
 I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
 I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
 Add all productions starting with A in I2 State because "•" is
followed by the non-terminal. So, the I2 State becomes

Department of Computer Science and Engineering, BVCOE New Delhi


115
Solution
 I2= S → A•A, $
A → •aA, $
A → •b, $
 I3= Go to (I0, a) = Closure ( A → a•A, a/b )
 Add all productions starting with A in I3 State because "•" is
followed by the non-terminal. So, the I3 State becomes
 I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
 Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)

116 Department of Computer Science and Engineering, BVCOE New Delhi


Solution
 I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
 Add all productions starting with A in I6 State because "•" is
followed by the non-terminal. So, the I6 State becomes
 I6 = A → a•A, $
A → •aA, $
A → •b, $
 Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
 I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $

Department of Computer Science and Engineering, BVCOE New Delhi


117
Solution
 If we analyze then LR (0) items of I3 and I6 are same but they
differ only in their lookahead.
 I3 = { A → a•A, a/b
A → •aA, a/b
A → •b, a/b
}
 I6= { A → a•A, $
A → •aA, $
A → •b, $
}

Department of Computer Science and Engineering, BVCOE New Delhi


118
Solution
 Clearly I3 and I6 are same in their LR (0) items but differ in their
lookahead, so we can combine them and called as I36.
 I36 = { A → a•A, a/b/$
A → •aA, a/b/$
A → •b, a/b/$
}
 The I4 and I7 are same but they differ only in their look ahead, so
we can combine them and called as I47.
 I47 = {A → b•, a/b/$}
 The I8 and I9 are same but they differ only in their look ahead, so
we can combine them and called as I89.
 I89 = {A → aA•, a/b/$}
Department of Computer Science and Engineering, BVCOE New Delhi
119
Drawing DFA:

120 Department of Computer Science and Engineering, BVCOE New Delhi


LALR(1)

Department of Computer Science and Engineering, BVCOE New Delhi


121

You might also like