Syntax Analysis (Parsers)

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 97

Noida Institute of Engineering and Technology, Greater Noida

Syntax Analysis (Parsers)

Unit: 2

Subject:
Compiler Design (RCS-602)
Nishant Kumar Hind
Course Details Assistant Professor
B Tech CSE 6th Sem CSE Department

Nishant Kumar Hind RCS-602 CD Unit -2


1
08/11/2021
Content

• Syntax Analysis • Predictive parsing algorithm


• Top Down Parsing • Bottom- up parsing
• Left Recursion elimination • Shift-reduce parser
• Left factoring • Handle Pruning
• Recursive Descent Parsing • Shift reduce parser
• First and Follow • Operator precedence parser
• Computing First • LR-Parser
• Computing Follow
• LL(1) Grammar
• Construction of predictive
parsing table

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 2
Course Objectives

Introduce students to the concepts underlying the design and


implementation of language processors. More specifically, by the end
of the course, students will be able to answer these questions:
• What language processors are, and what functionality do they
provide to their users?
• What core mechanisms are used for providing such functionality?
• How are these mechanisms implemented?
• Apart from providing a theoretical background, the course places a
special emphasis in practical issues in designing language
processors.

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 3
Unit Objective(CO2)

• Introduce students to the concepts to design various parser like


• Top down parser
• Bottom up parser
• Operator precedence parser
• SLR parser
• CLR parser
• LALR parser

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 4
Course Outcomes

• CO1: To have the knowledge of patterns, tokens, regular


expressions and finite automata to develop a scanner or lexical
analyzer.
• CO2: To design and develop various parser by parsing LL parser and
LR parser
• CO3: To apply various design & conduct experiments for
Intermediate Code Generation in compiler.
• CO4: To design and develop various Data structure for symbols
tables and Error Detection & Recovery at every phases
• CO5: To apply various new code optimization techniques to improve
the performance of a program in terms of speed & space.

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 5
Program Outcomes (PO)
• PO1: Engineering Knowledge
• PO2: Problem Analysis
• PO3: Design/Development of solutions
• PO4: Conduct Investigations of complex problems
• PO5: Modern tool usage
• PO6: The engineer and society
• PO7: Environment and sustainability
• PO8: Ethics
• PO9: Individual and team work
• PO10: Communication
• PO11: Project management and finance
• PO12: Life-long learning

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 6
CO-PO Mapping

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10PO11PO12
 

RCS-602.1 3 3 3 3 3 1 1 1 3 1 2 2

RCS-602.2 3 3 3 3 3 1 1 1 3 1 2 2

RCS-602.3 3 3 3 3 3 1 1 1 3 1 2 2

RCS-602.4 3 2 3 3 3 1 1 1 3 1 2 2

RCS-602.5 3 2 3 3 3 1 1 1 3 1 2 2

AVG 3 2.6 3 3 3 1 1 1 3 1 2 2

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 7
Program Specific Outcomes (PSO)
• PSO1: The ability to identify, analyze real world problems and design their
ethical solutions using artificial intelligence, robotics, virtual/augmented
reality, data analytics, block chain technology, and cloud computing.
•  
• PSO2:The ability to design and develop the hardware sensor devices and
related interfacing software systems for solving complex engineering
problems.
•  
• PSO 3:The ability to understand inter disciplinary computing techniques
and to apply them in the design of advanced computing.
•  
• PSO 4:The ability to conduct investigation of complex problem with the
help of technical, managerial, leadership qualities, and modern
engineering tools provided by industry sponsored laboratories.

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 8
CO-PSO Mapping

PSO1 PSO2 PSO3 PSO4


RCS602.1 3 3 3 1
RCS602.2 3 3 3 1
RCS602.3 3 3 3 1
RCS602.4 3 3 3 1
RCS602.5 3 3 3 1
AVG 3 3 3 1

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 9
Prerequisite

• Algorithms
• Languages and machines
• Operating systems
• Computer architectures
• Theory of Automata

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 10
Recap
• Lexical analysis turns input characters into tokens.
• Lexical syntax is described by regular expressions.
• Lexical analysis is the very first phase in the compiler designing
• A lexeme is a sequence of characters that are included in the source
program according to the matching pattern of a token
• Lexical analyzer is implemented to scan the entire source code of the
program
• Lexical analyzer helps to identify token into the symbol table
• A character sequence which is not possible to scan into any valid token is a
lexical error
• Removes one character from the remaining input is useful Error recovery
method
• Lexical Analyser scan the input program.
• It eases the process of lexical analysis and the syntax analysis by eliminating
unwanted tokens.

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 11
Topic Mapping with Course Outcome
S.No. Topic Course Outcome
1 Syntax Analysis CO2
2 Top Down Parsing CO2
3 Left Recursion elimination CO2
4 Left factoring CO2
5 Recursive Descent Parsing CO2
6 First and Follow CO2
7 Computing First CO2
8 Computing Follow CO2
9 LL(1) Grammar CO2
10 Construction of predictive parsing table CO2
11 Predictive parsing algorithm CO2
12 Bottom- up parsing CO2
13 Shift-reduce parser CO2
14 Handle Pruning CO2
15 Shift reduce parser CO2
16 Operator precedence parser CO2
17 LR-Parser CO2
08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 12
Syntax Analysis
• Syn
• token tax
• Source • Lexical • Parse tree • Intermediate
• Semantic
An
• program Analyzer
• representatio
Analyzer

• getNext aly
• Token zer

• Symbol
• table

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 13
Top Down Parsing: Left Recursion
Elimination

• A grammar is left recursive if it has a non-terminal A such that


there is a derivation A=> Aα
• Top down parsing methods cant handle left-recursive grammars
• A simple rule for direct left recursion elimination:
– For a rule like:
• A -> A α|β
– We may replace it with
• A -> β A’
• A’ -> α A’ | ɛ

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 14
Left Recursion Elimination

• There are cases like following


– S -> Aa | b
– A -> Ac | Sd | ɛ
• Left recursion elimination algorithm:
– Arrange the nonterminals in some order A1,A2,…,An.
– For (each i from 1 to n) {
• For (each j from 1 to i-1) {
– Replace each production of the form Ai-> Aj γ by the production
Ai -> δ1 γ | δ2 γ | … |δk γ where Aj-> δ1 | δ2 | …
|δk are all current Aj productions
–}
– Eliminate left recursion among the Ai-productions
• }

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 15
Left Factoring

• Left factoring is a grammar transformation that is useful for


producing a grammar suitable for predictive or top-down parsing.
• Consider following grammar:
– Stmt -> if expr then stmt else stmt
– | if expr then stmt
• On seeing input if it is not clear for the parser which production to
use
• We can easily perform left factoring:
– If we have A->αβ1 | αβ2 then we replace it with
• A -> αA’
• A’ -> β1 | β2

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 16
Left Factoring

• Algorithm
– For each non-terminal A, find the longest prefix α common to
two or more of its alternatives. If α<> ɛ, then replace all of A-
productions A->αβ1 |αβ2 | … | αβn | γ by
• A -> αA’ | γ
• A’ -> β1 |β2 | … | βn
• Example:
– S -> I E t S | i E t S e S | a
– E -> b

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 17
Top Down Parsing

• A Top-down parser tries to create a parse tree from the root


towards the leafs scanning input from left to right
• It can be also viewed as finding a leftmost derivation for an input
string
• Example: id+id*id

E -> TE’ E E E E E E
lm lm lm lm lm
E’ -> +TE’ | Ɛ
T -> FT’ T E’ T E’ T E’ T E’ T E’
T’ -> *FT’ | Ɛ
F -> (E) | id F T’ F T’ F T’ F T’ + T E’

id id Ɛ id Ɛ

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 18
Recursive Descent Parsing

• General recursive descent may require backtracking


• The previous code needs to be modified to allow backtracking
• In general form it cant choose an A-production easily.
• So we need to try all alternatives
• If one failed the input pointer needs to be reset and another
alternative should be tried
• Recursive descent parsers cant be used for left-recursive grammars

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 19
Recursive Descent Parsing

• Example
S->cAd
A->ab | a Input: cad

S S S

c A d c A d c A d

a b a

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 20
First and Follow

• First() is set of terminals that begins strings derived from


• If α=>ɛ then is also in First(ɛ)
• In predictive parsing when we have A-> α|β, if First(α) and First(β)
are disjoint sets then we can select appropriate A-production by
looking at the next input
• Follow(A), for any nonterminal A, is set of terminals a that can
appear immediately after A in some sentential form
– If we have S => αAaβ for some αand βthen a is in Follow(A)
• If A can be the rightmost symbol in some sentential form, then $ is
in Follow(A)

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 21
Computing First

• To compute First(X) for all grammar symbols X, apply following rules


until no more terminals or ɛ can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a production for some
k>=1, then place a in First(X) if for some i a is in First(Yi) and ɛ is
in all of First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in
First(Yj) for j=1,…,k then add ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 22
Computing Follow

• To compute First(A) for all nonterminals A, apply following rules


until nothing can be added to any follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in First(β)
except ɛ is in Follow(B).
3. If there is a production A->B or a production A->αBβ where
First(β) contains ɛ, then everything in Follow(A) is in Follow(B)

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 23
LL(1) Grammars

• Predictive parsers are those recursive descent parsers needing no


backtracking
• Grammars for which we can create predictive parsers are called
LL(1)
– The first L means scanning input from left to right
– The second L means leftmost derivation
– And 1 stands for using one input symbol for lookahead
• A grammar G is LL(1) if and only if whenever A-> α|βare two distinct
productions of G, the following conditions hold:
– For no terminal a do αandβ both derive strings beginning with a
– At most one of α or βcan derive empty string
– If α=> ɛ then βdoes not derive any string beginning with a
terminal in Follow(A).

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 24
Construction of Predictive Parsing Table

• For each production A->α in grammar do the following:


1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ
to M[A,b]. If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to
M[A,$] as well
• If after performing the above, there is no production in M[A,a]
then set M[A,a] to error

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 25
Construction of Predictive Parsing Table

• Example

First Follow
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’ {(,id} {+, *, ), $}
F
T’ -> *FT’ | Ɛ {(,id} {+, ), $}
T
F -> (E) | id
E {(,id} {), $}
E’ {+,ɛ}
T’ {), $}
{*,ɛ} {+, ), $}

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 26
Construction of Predictive Parsing Table

Input Symbol
Non -terminal id + * ( ) $
E E -> TE’ E -> TE’
E’ E’ -> Ɛ E’ -> Ɛ
E’ -> +TE’
T
T -> FT’ T -> FT’
T’
T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ
F

F -> id F -> (E)

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 27
Non-recursive Predicting Parsing

a +

• Predictive
• X • parsing • output
• st
• Y • program
a
c • Z
k • $ • Parsing
• Table
• M
Nishant Kumar Hind RCS-602 CD Unit -2
08/11/2021 28
Construction of Predictive Parsing Table

• For each production A->α in grammar do the following:


1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ
to M[A,b]. If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to
M[A,$] as well
• If after performing the above, there is no production in M[A,a]
then set M[A,a] to error

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 29
Construction of Predictive Parsing Table

• For each production A->α in grammar do the following:


1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ
to M[A,b]. If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to
M[A,$] as well
• If after performing the above, there is no production in M[A,a]
then set M[A,a] to error

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 30
Predictive Parsing Algorithm

Set ip point to the first symbol of w;


Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 31
Predictive Parsing Algorithm

Set ip point to the first symbol of w;


Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 32
Bottom-Up Parsing

•Attempts to traverse a parse tree bottom up (post-order traversal)


•Reduces a sequence of tokens to the start symbol
•At each reduction step, the RHS of a production is replaced with
LHS
•A reduction step corresponds to the reverse of a rightmost
derivation
•Constructs parse tree for an input string beginning at the leaves (the
bottom) and working towards the root (the top)
•Example: id*id
E -> E + T | T
T -> T * F | F
F -> (E) | id

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 33
Shift-Reduce Parser

• The general idea is to shift some symbols of input to the stack until
a reduction can be applied
• At each reduction step, a specific substring matching the body of a
production is replaced by the nonterminal at the head of the
production
• The key decisions during bottom-up parsing are about when to
reduce and about what production to apply
• A reduction is a reverse of a step in a derivation
• The goal of a bottom-up parser is to construct a derivation in
reverse:
– E=>T=>T*F=>T*id=>F*id=>id*id

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 34
Handle Pruning

• A Handle is a substring that matches the body of a production and


whose reduction represents one step along the reverse of a
rightmost derivation

Right sentential form Handle Reducing production

id*id id F->id
F*id F T->F
T*id id F->id

T*F T*F E->T*F

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 35
Shift-Reduce Parser

• A stack is used to hold grammar symbols


• Handle always appear on top of the stack
• Initial configuration:
Stack Input
$ w$
• Acceptance configuration
Stack Input
$S $

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 36
Conflicts During Shit Reduce Parsing

• Two kind of conflicts


– Shift/reduce conflict
– Reduce/reduce conflic

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 37
Operator Precedence Parser

• An operator precedence parser is a bottom-up parser that


interprets an operator grammar. This parser is only used for
operator grammars. Ambiguous grammars are not allowed in any
parser except operator precedence parser.
• Operator precedence grammar is kinds of shift reduce parsing
method. It is applied to a small class of operator grammars.
• A grammar is said to be operator precedence grammar if it has two
properties:
• No R.H.S. of any production has a ∈.
• No two non-terminals are adjacent.
• Operator precedence can only established between the terminals of
the grammar. It ignores the non-terminal.

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 38
Operator Precedence Parser

• There are the three operator precedence relations:-

• a ⋗ b means that terminal "a" has the higher precedence than


terminal "b".
• a ⋖ b means that terminal "a" has the lower precedence than
terminal "b".
• a ≐ b means that the terminal "a" and "b" both have same
precedence.

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 39
Operator Precedence Parser

• consider the following grammar:


• E-> E+E | E*E | id
• For this grammar the precedence table looks something like this:

• The table size is n*n where n is the number of terminal symbols in


the grammar

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 40
Operator Precedence Parser

 Parsing Given String-


• insert $ symbol at end of the string as
• id+id*id$
• scan and parse the string as-
• stack input action
• $ id+id*id$ $ <. id shift
• $id +id*id$ id .> + reduce E → id
• $ +id*id$ shift
• $+ id*id$ shift
• $+id *id$ id .> * reduce E → id
• $+ *id$ shift

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 41
Operator Precedence Parser

stack input action


• $+* id$ shift
• $+*id $ id .> $ reduce E → id
• $+* $ * .> $ reduce E → E*E
• $+ $ + .> $ reduce E → E+E
• $ $ accept

•  

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 42
Precedence Functions

• The storage space however can be efficiently managed if we can


build an efficient data structure to represent adequately the
precedence relationships between the terminals.

• For this we can construct two functions f and g where the following
hold:
1. If a. >b then f(a)>g(b)
2. If a< . b then f(a)<g(b)
3. If a. =b then f(a)=g(b)

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 43
Constructing Precedence Functions

• To construct these functions we create a directed graph with


nodes fi and gi where i is the ith terminal, obeying the following
rules:
⁻ If a. =b then fa and fb are merged, and ga and gb are merged
⁻ If a. >b then a directed edge is introduced from fa to gb
⁻ If a< . b then a directed edge is introduced to fa from gb Following
these rules, the precedence graph for the above mentioned
precedence table looks something like this:
• Example: consider the following table

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 44
Constructing Precedence Functions

• Using the algorithm leads to the following graph:

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 45
Constructing Precedence Functions

• From which we extract the following precedence functions:

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 46
States of an LR Parser

• States represent set of items


• An LR(0) item of G is a production of G with the dot at some
position of the body:
• For A->XYZ we have following items
• A->.XYZ
• A->X.YZ
• A->XY.Z
• A->XYZ.
• In a state having A->.XYZ we hope to see a string derivable from
XYZ next on the input.
• What about A->X.YZ

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 47
Constructing canonical LR(0) item sets

• Augmented grammar:
– G with addition of a production: S’->S
• Closure of item sets:
– If I is a set of items, closure(I) is a set of items constructed from I
by the following rules:
• Add every item in I to closure(I)
• If A->α.Bβ is in closure(I) and B->γ is a production then add
the item B->.γ to clsoure(I). I0=closure({[E’->.E]}
• Example E’->.E
E->.E+T
E’->E E->.T
E -> E + T | T
T->.T*F
T -> T * F | F
F -> (E) | id T->.F
F->.(E)
F->.id
Nishant Kumar Hind RCS-602 CD Unit -2
08/11/2021 48
Constructing canonical LR(0) item sets

• Goto (I,X) where I is an item set and X is a grammar symbol is


closure of set of all items [A-> αX. β] where [A-> α.X β] is in I
• Example
I1
E’->E.
E
E->E.+T
I0=closure({[E’->.E]}
E’->.E T I2
E->.E+T E’->T.
E->.T T->T.*F
T->.T*F I4
T->.F ( F->(.E)
F->.(E) E->.E+T
E->.T
F->.id T->.T*F
T->.F
F->.(E)
F->.id

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 49
Constructing canonical LR(0) Item Sets

• Closure algorithm
SetofItems CLOSURE(I) {
J=I;
repeat
for (each item A-> α.Bβ in J)
for (each prodcution B->γ of G)
if (B->.γ is not in J)
add B->.γ to J;
until no more items are added to J on one round;
return J;

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 50
Constructing Canonical LR(0) Item Sets

• GOTO algorithm
SetofItems GOTO(I,X) {
J=empty;
if (A-> α.X β is in I)
add CLOSURE(A-> αX. β ) to J;
return J;
}

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 51
Constructing canonical LR(0) item sets

• Canonical LR(0) items


• Void items(G’) {
C= CLOSURE({[S’->.S]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (GOTO(I,X) is not empty and not in C)
add GOTO(I,X) to C;
until no new set of items are added to C on a round;
}

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 52
Constructing canonical LR(0) item sets
• I1
• acc• E’- • I6

• •$ I2 • • I9
E->E+.T

>E. •

T
T->.T*F

• +
• • T->.F
E->E+T.

• E •• E-
E’-
• I0=closure({[E’- • F->.(E)
• T->T.*F
>.E]} • F->.id


E’->.E
E->.E+T
>T.
• T >E. • *

• F • T->T*F.

• I10I7
T->T*.F
• E->.T
•• I5 • id
T-
+T
• F->.(E)
• T->.T*F
• id • F->.id
• T->.F
• F->T.* • +


F->.(E)
• ( • I4
F->.id
• F
>id. F->(.E)
• I8 • I11
• E •• E->E.+T •
• E->.E+T
F •

E->.T
T->.T*F F->(E.)
) • F->(E).
• T->.F
• F->.(E)

• I3
• F->.id

• T>F.
Nishant Kumar Hind RCS-602 CD Unit -2
08/11/2021 53
LR parsing Algorithm
let a be the first symbol of w$;
while(1) { /*repeat forever */
let s be the state on top of the stack;
if (ACTION[s,a] = shift t) {
push t onto the stack;
let a be the next input symbol;
} else if (ACTION[s,a] = reduce A->β) {
pop |β| symbols of the stack;
let state t now be on top of the stack;
push GOTO[t,A] onto the stack;
output the production A->β;
} else if (ACTION[s,a]=accept) break; /* parsing is done */
else call error-recovery routine;
}

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 54
LR parsing Algorithm
STATE ACTON GOTO
(0) E’->E
(1) E -> E + T
id + * ( ) $ E T F (2) E-> T
(3) T -> T * F
0 S5 S4 1 2 3 (4) T-> F
1 S6 Acc (5) F -> (E)
(6) F->id
2 R2 S7 R2 R2
3 R4 R7 R4 R4
4 S5 S4 8 2 3
5 R6 R6 R6 R6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 R1 S7 R1 R1
10 R3 R3 R3 R3
11 R5 R5 R5 R5

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 55
LR parsing algorithm
Line Stack Symbols Input Action
(1) 0 id*id+id$ Shift to 5
(2) 05 id *id+id$ Reduce by F->id
(3) 03 F *id+id$ Reduce by T->F
(4) 02 T *id+id$ Shift to 7
(5) 027 T* id+id$ Shift to 5
(6) 0275 T*id +id$ Reduce by F->id
(7) 02710 T*F +id$ Reduce by T->T*F
(8) 02 T +id$ Reduce by E->T
(9) 01 E +id$ Shift
(10) 016 E+ id$ Shift
(11) 0165 E+id $ Reduce by F->id
(12) 0163 E+F $ Reduce by T->F
(13) 0169 E+T` $ Reduce by E->E+T
(14) 01 E $ accept

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 56
More powerful LR parsers

• Canonical-LR or just LR method


• Use lookahead symbols for items: LR(1) items
• Results in a large collection of items
• LALR: lookaheads are introduced in LR(0) items

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 57
Canonical LR(1) items

• In LR(1) items each item is in the form: [A->α.β,a]


• An LR(1) item [A->α.β,a] is valid for a viable prefix γ if there is a
derivation S=>δAw=>δαβw, where
• Γ= δα
• Either a is the first symbol of w, or w is ε and a is $
• Example:
• S->BB
• B->aB|b

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 58
Constructing LR(1) sets of items
SetOfItems Closure(I) {
repeat
for (each item [A->α.Bβ,a] in I)
for (each production B->γ in G’)
for (each terminal b in First(βa))
add [B->.γ, b] to set I;
until no more items are added to I;
return I;
}
SetOfItems Goto(I,X) {
initialize J to be the empty set;
for (each item [A->α.Xβ,a] in I)
add item [A->αX.β,a] to set J;
return closure(J);
}

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 59
Constructing LR(1) sets of items
void items(G’){
initialize C to Closure({[S’->.S,$]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (Goto(I,X) is not empty and not in C)
add Goto(I,X) to C;
until no new sets of items are added to C;
}

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 60
Canonical LR(1) parsing table
• Method
– Construct C={I0,I1, … , In}, the collection of LR(1) items for G’
– State i is constructed from state Ii:
• If [A->α.aβ, b] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to “shift j”
• If [A->α., a] is in Ii, then set ACTION[i,a] to “reduce A->α”
• If {S’->.S,$] is in Ii, then set ACTION[I,$] to “Accept”
– If any conflicts appears then we say that the grammar is not
LR(1).
– If GOTO(Ii,A) = Ij then GOTO[i,A]=j
– All entries not defined by above rules are made “error”
– The initial state of the parser is the one constructed from the set
of items containing [S’->.S,$]

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 61
LALR Parsing Table
• For the previous example we had

I4
Cd. , c/d
I47
Cd. , c/d/$

I7
Cd. , $

State merges cant produce Shift-Reduce conflicts. Why?


But it may produce reduce-reduce conflict

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 62
LALR(1) : Look-Ahead LR(1)

•Preferred parsing technique in many parser generators


•Close in power to LR(1), but with less number of states
•Increased number of states in LR(1) is because
• Different lookahead tokens are associated with same
LR(0) items
•Number of states in LALR(1) = states in LR(0)
•LALR(1) is based on the observation that
• Some LR(1) states have same LR(0) items
• Differ only in lookahead tokens
•LALR(1) can be obtained from LR(1) by
• Merging LR(1) states that have same LR(0) items
• Obtaining the union of the LR(1) lookahead tokens
Nishant Kumar Hind RCS-602 CD
08/11/2021 63
Unit -2
Conflicts in LR(0) and SLR(1) Parser
•Conflicts in LR(0) and SLR(1) Parser
•There are two types of conflicts that can occur in an SLR(1) parsing table.
•A shift-reduce conflict occurs in a state that contains both a shift action and
a reduce action.
•A reduce-reduce conflict occurs in a state that contains two or more
different reduce actions.
• How to Determine:-
A full parsing table is not needed, only the canonical collection is required.
In the canonical collection, find all the items having dot at the end if:
There are both shift and reduce scenario in the same item ("shift-reduce", s/r)
•Let I5 contains:
•A -> b •
•X -> X • a
•Here we have transition on "a", and reduce by A -> b.
•then if
•Follow(A) ∩ {a} ≠ ø then there is no conflict. 

Nishant Kumar Hind RCS-602 CD


08/11/2021 64
Unit -2
Conflicts in LR(0) and SLR(1) Parser
Conflicts in LR(0) and SLR(1) Parser
 
•There are two reduce actions scenario in the same item ("reduce-reduce",
r/r)
•Let I5 contains:
•A -> a •
•B -> b • then if
•Follow(A) ∩ Follow(B) ≠ ø then no conflict.
• 
•Difference between LR(0) and SLR(1):
•The only difference between LR(0) and SLR(1) is this extra ability to help
decide what action to take when there are conflicts. Because of this, any
grammar that can be parsed by an LR(0) parser can be parsed by
an SLR(1) parser. However, SLR(1) parsers can parse a larger number of
grammars than LR(0).

Nishant Kumar Hind RCS-602 CD


08/11/2021 65
Unit -2
LEX and YACC

Nishant Kumar Hind RCS-602 CD Unit -2


08/11/2021 66
Faculty Video Links, Youtube & NPTEL Video Links and Online
Courses Details

• Youtube /other Video Links


• https://www.youtube.com/watch?v=TkRy17gnq3E
• https://www.youtube.com/watch?v=N9UuAPU6DAg
• https://www.youtube.com/watch?v=APJ_Eh60Qwo
• https://www.youtube.com/watch?v=QGz6tapRckQ
• https://www.youtube.com/watch?v=wd6P3-4nMoc
• https://www.youtube.com/watch?v=f6lw7x33i50
• NPTEL
• https://youtu.be/1qOMlqE6LhU
• https://youtu.be/g6ZCpS7Farc
• https://youtu.be/UvOQiSiHAG4

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 67


Daily Quiz

• Top-down parsing is a technique to find _________ .


(a) Leftmost derivation
(b) Rightmost derivation
(c) Leftmost derivation in reverse
(d) Rightmost derivation in reverse
• Predictive parsing is possible only for __________ .
(a) LR(k) grammar
(b) LALR(1) grammar
(c) LL(k) grammar
(d) CLR(1) grammar
• Which two functions are required to construct a parsing table in predictive
parsing technique?
(a) CLOSURE() and GOTO ()
(b) FIRST() and FOLLOW()
(c) ACTI0N() and GOTO()
(d) None of these 
08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 68
Daily Quiz

• Non-recursive predictive parser contains ___________ .


(a) An input buffer
(b) A parsing table
(c) An output stream
(d) All of these
• Which of these parsing techniques is a kind of bottom-up
parsing?
(a) Shift-reduce parsing
(b) Reduce-reduce parsing
(c) Predictive parsing
(d) Recursive-decent parsing

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 69


Weekly Assignment

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 70


Weekly Assignment

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 71


Weekly Assignment

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 72


MCQ s

• Which of the following methods is used by the bottom-up parser to generate a parse
tree?
(a) Leftmost derivation
(b) Rightmost derivation
(c) Leftmost derivation in reverse
(d) Rightmost derivation in reverse 
• Handle pruning forms the basis of ____________ .
(a) Bottom-up parsing
(b) Top-down parsing
(c) Both (a) and (b)
(d) None of these 
• In shift-reduce parsing, accept action occurs __________ .
(a) When we have the right end of the handle at the top of the stack
(b) When we have the left end of the handle at the top of the stack
(c) When parser declares the successful completion of parsing
(d) When the parser finds a syntax error in the input and calls an error recovery routine
 

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 73


MCQ s

• If A  →   αβ is a production and a is terminal symbol or right end marker $, then LR(1) items
will be defined by the production ________.
(a) [A  →   α.β, a]
(b) [A  →   .αβ, a]
(c) [A  →   α.βa]
(d) [A  →   a.β, α]
• Given a grammar G:
    T → BCTd | Bcd
    CB → BC
    Cc → cc
    Bc → bc
    Bb → b
Which of the following sentences can be derived by G?
(a) bcd
(b) bbc
(c) bcdd
(d) bccd

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 74


Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 75
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 76
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 77
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 78
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 79
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 80
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 81
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 82
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 83
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 84
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 85
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 86
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 87
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 88
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 89
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 90
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 91
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 92
Unit -2
Old Question Papers

Nishant Kumar Hind RCS-602 CD


08/11/2021 93
Unit -2
Expected Questions for University Exam

• Construct LL (1) parsing table for the following grammar and parse
string (a, (a, a)) by using the table.
S (L) | a
L L, S | S
• Explain LR (k) parsers and write an algorithm for implementing it.
• Construct LR(1) parsing table for grammar. Where start symbol is S.
S xAy / xBy / xAz
A aS / q
B q.

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 94


Summary

• Syntax analysis is a second phase of the compiler design process that


comes after lexical analysis.
• The syntactical analyser helps you to apply rules to the code.
• Parser checks that the input string is well-formed, and if not, reject it.
• Parsing techniques are divided into two different groups: Top-Down
Parsing, Bottom-Up Parsing.
• A grammar is a set of structural rules which describe a language.
• Notational conventions symbol may be indicated by enclosing the element
in square brackets.
• A CFG is a left-recursive grammar that has at least one production of the
type.
• Grammar derivation is a sequence of grammar rule which transforms the
start symbol into the string.

Nishant Kumar Hind RCS-602 CD


08/11/2021 95
Unit -2
References

• Principles of Compiler Design Textbook by Alfred Aho and Jeffrey


Ullman
• Principle of Compiler Design, A.V.Aho, Rabi Sethi, J.D.Ullman.
• Compilers: Principles, Techniques and Tools  A.V.Aho, Monica S.
Lam, Rabi Sethi, J.D.Ullman
• https://www.geeksforgeeks.org/compiler-design-tutorials/
• https://www.javatpoint.com/compiler-tutorial
• https://www.tutorialspoint.com/compiler_design/index.htm
• https://nptel.ac.in/courses/106105190/

Nishant Kumar Hind RCS-602 CD


08/11/2021 96
Unit -2
Noida Institute of Engineering and Technology, Greater Noida

Thank You

08/11/2021 Nishant Kumar Hind RCS-602 CD Unit -2 97

You might also like