Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 8

Unit-III Bottom-up parsers

Shift-reduce parsing (Bottom-up parsing):


Handles: A handle of a right-sentential form  is a production Aand a position of  where the
string  may be found and replaced by A to produce the previous right-sentential form in a rightmost
derivation of For ex. if

*
S=>Aw=>w,
rm rm
then A in the position following  is a handle of w. The string w to the right of handle
consists of only terminals. If a grammar is unambiguous, then every right-sentential form of the
grammar has exactly one handle.
Ex. Cosider the CFG: EE+E EE*E E(E) Eid and the input string
id1+id2*id3. The following sequence reduced the input string to the start symbol E:

Right-sentential form Handle Reducing production


id1+id2*id3 id1 Eid
E+id2*id3 id2 Eid
E+E*id3 id3 Eid
E+E*E E*E EE*E
E+E E+E EE+E
E
As shown above shift-reduce parser gives rightmost derivation in reverse.

Shift-reduce parsing action above can be implemented using stack as below:


Stack Input Action
1. $ id1+id2*id3$ shift
2. $id1 +id2*id3$ reduce by Eid
3. $E +id2*id3$ shift
4. $E+ id2*id3$ shift
5. $E+id2 *id3$ reduce by Eid
6. $E+E *id3$ shift
7. $E+E* id3$ shift
8. $E+E*id3 $ reduce by Eid
9. $E+E*E $ reduce by EE*E
10.$E+E $ reduce by EE+E
11.$E $ accept

A parse tree can be constructed during shift-reduce parsing steps above.

Viable prefix: Viable prefixes are the prefixes of right sentential form that can appear on the stack of
a shift-reduce parser.

Conflicts during Shift-reduce parsing: For some CFGs shift-reduce parsers cannot be used. Shift-
reduce parser for such CFG may reach a stage when it cannot decide whether to shift or to reduce ( a
shift-reduce conflict ), or which of the several reductions to make ( a reduce-reduce conflict ).
Ambiguous grammars cannot be LR.
LR parsers: Here L stands for left-to-right scanning of the input, R stands for a rightmost derivation
in reverse, and k stands for no. of input symbols of lookahead used in making parsing decisions. By
default k is 1.
Advantages of LR parser: 1. LR parsers can be constructed for almost all programming language
constructs for which CFG can be written. 2. LR parser can be implemented as an efficient shift-
reduce parser. 3. LR parser can detect a syntactic error as soon as it is possible on left-to-right scan
of the input. 4. Since every LL(1) grammar is an LR(1) grammar, LR parsers can be constructed for
all LL(1) grammars in addition to other grammars which are not LL(1).
Disadvantage of LR parser: It is very time-consuming and tedious to construct LR parser for a
programming language grammar without the help of a tool like Yacc.
LR parser are of three types: 1. SLR (Simple LR) : This is the easiest to implement but is the least
powerful. 2. LR (Canonical LR): This is the most powerful but takes a lot of space. 3. LALR
(lookahead LR) : This is less powerful than LR but more powerful than SLR. It takes less space than
LR.

Model of an LR parser: input a1 …. ai ……………an $

Sm
Stack Xm
LR parsing program Output
.
.

S0 action goto

Ex.1 (SLR /SLR(1)/LR(0)/Simple LR parser): Construct an SLR parser for the following grammar
and show the moves of this parser on input: id*id+id
EE+T|T TT*F|F F(E)|id
Ans:
Let us augment the grammar and sequence the productions:
E’E 1. EE+T 2. ET 3. TT*F 4. TF 5.F(E) 6. Fid
FIRST and FOLLOW for the given grammar is as follows:
FIRST(E)=FIRST(T)=FIRST(F)={(,id}
FOLLOW(E)={$,+,)} FOLLOW(T)={*}U FOLLOW(E)={*,$,+,)}
FOLLOW(F)=FOLLOW(T)={*,$,+,)}
LR(0) Set of items :
I0: closure(E’E): I4:c(g(I0,()): I8:c(g(I4,E)):
E’.E F(.E) F(E.)
E.E+T E.E+T EE.+T
E.T E.T
T.T*F T.T*F
T.F T.F
F.(E) F.(E)
F.id F.id
I1:c(g(I0,E)): I5:c(g(I0,id)): I2:c(g(I4,T))
E’E. Fid. I3:c(g(I4,F))
EE.+T I4:c(g(I4,())
I5:c(g(I4,id))
I2:c(g(I0,T)): I6:c(g(I1,+)): I9:c(g(I6,T))
ET. EE+.T EE+T.
TT.*F T.T*F TT.*F
T.F I3:c(g(I6,F))
F.(E) I4:c(g(I6,())
F.id I5:c(g(I6,id))
I3:c(g(I0,F)): I7:c(g(I2,*)): I10:c(g(I7,F))
TF. TT*.F TT*F.
F.(E) I4:c(g(I7,())
F.id I5:c(g(I7,id))
I11:c(g(I8,)))
F(E).
I6:c(g(I8,+))
I7:c(g(I9,*))

Parsing table:
State Action goto
+ * ( ) id $ E T F
0 S4 S5 1 2 3
1 S6 acc
2 R2 S7 R2 R2
3 R4 R4 R4 R4
4 S4 S5 8 2 3
5 R6 R6 R6 R6
6 S4 S5 9 3
7 S4 S5 10
8 S6 S11
9 R1 S7 R1 R1
10 R3 R3 R3 R3
11 R5 R5 R5 R5

Since above parsing table has no multiply-defined entry, the given grammar is SLR.

Moves of SLR parser on id*id+id:


Stack Input Action
1. 0 id * id + id $ Shift
2. 0 id 5 * id + id $ Reduce by Fid
3. 0 F 3 * id + id$ Reduce by TF
4. 0 T 2 * id + id $ Shift
5. 0 T 2 * 7 id + id $ Shift
6. 0 T 2 * 7 id 5 + id$ Reduce by Fid
7. 0 T 2 * 7 F 10 + id$ Reduce by TT*F
8. 0 T 2 + id$ Reduce by ET
9. 0 E 1 + id$ Shift
10. 0 E 1 + 6 id$ Shift
11. 0 E 1 + 6 id 5 $ Reduce by Fid
12. 0 E 1 + 6 F 3 $ Reduce by TF
13. 0 E 1 + 6 T 9 $ EE+T
14. 0 E 1 $ accept

Ex.2: (LR/LR(1)/Canonical LR) Construct LR parser for the following grammar.


SCC CcC|d
Ans: Let us augment the grammar and sequence the productions:
S’S 1.SCC 2. CcC 3.Cd
LR set of items:
I0 I1
S c
S’.S,$ S’S.,$ I6
I2 I5 Cc.C,$ I9
C SC.C,$ C SCC.,$ C.cC,$ C CcC.,$
S.CC,$ C.cC,$ c C.d,$ I7
C.d,$ d Cd.,$
I3 I8
C.cC,c/d c Cc.C,c/d C CcC.,c/d
C.cC,c/d
C.d,c/d c
d
d I4
C.d,c/d Cd.,c/d

Parsing table:
State Action goto
c d $ S C
0 S3 S4 1 2
1 acc
2 S6 S7 5
3 S3 S4 8
4 R3 R3
5 R1
6 S6 S7 9
7 R3
8 R2 R2
9 R2

Since above parsing table has no multiply-defined entry, the given grammar is LR.

Ex.3: (LALR/LALR(1)/LookAhead LR) Construct LALR parser for the following grammar:
SCC CcC|d
Ans: Now, we combine the sets with common cores as follows:
I36: Cc.C,c/d/$
C.cC,c/d/$
C.d,c/d/$
I47: Cd.,c/d/$
I89: CcC.,c/d/$
LALR parsing table:
State Action Goto
c d $ S C
0 S36 S47 1 2
1 Acc
2 S36 S47 5
36 S36 S47 89
47 R36 R3 R36
5 R1
89 R2 R2 r2

Since above parsing table has no multiply-defined entry, the given grammar is LALR.

LALR Parsers: SLR and LALR parsers have the same number of states. LALR pasers are more
powerful than SLR but less powerful than LR. LALR parsing tables are considerably smaller than
LR parsers. Therefore, LALR parsers are commonly used in practice.
Shift actions depend only on the core and not on the lookahead. Therefore, merging of states with
common cores does not create a new shift-reduce conflict. But merging of states may create a
reduce-reduce conflict.
When input with error is given to LALR parser, it may do some reductions after the LR parser has
detected an error, but LALR parser never shifts another symbol after the LR parser detects an error.

Using ambiguous grammars: Any ambiguous grammar cannot be LR. But some ambiguous
grammars are easier to understand than unambiguous grammars. For ex. ambiguous grammar
EE+E|E*E|(E)|id specifies arithmetic expression syntax in a more natural form than ambiguous
grammar EE+T|T TT*F|F F(E)|id
SLR sets of items for the above ambiguous grammar after augmentation are as follows:
I0: c(E’.E) E’.E E.E+E E.E*E E.(E) E.id
I1: c(gI0,E)) E’E. EE.+E EE.*E
I2: c(g(I0,()) E(.E) E.E+E E.E*E E.(E) E.id
I3: c(g(I0,id)) Eid.
I4: c(g(I1,+)) EE+.E E.E+E E.E*E E.(E) E.id
I5: c(g(I1,*)) EE*.E E.E+E E.E*E E.(E) E.id
I6: c(g(I2,E)) E(E.) EE.+E EE.*E
I2: c(g(I2,()) I3: c(g(I2,id))
I7: c(g(I4,E)) EE+E. EE.+E EE.*E
I2: c(g(I4,()) I3: c(g(I4,id))
I8: c(g(I5,E)) EE*E. EE.+E EE.*E
I2: c(g(I5,()) I3: c(g(I5,id))
I9: c(g(I6,)) E(E).
I4: c(g(I6,+)) I5: c(g(I6,*)) I4: c(g(I7,+)) I5: c(g(I7,*)) I4: c(g(I8,+))
I5: c(g(I8.*))
Parsing table for ambiguous grammar:
State Action goto
+ * ( ) id $ E
0 S2 S3 1
1 S4 S5 Acc
2 S2 S3 6
3 R4 R4 R4 R4
4 S2 S3 7
5 S2 S3 8
6 S4 S5 S9
7 S4/r1 S5/r1 R1 R1
8 S4/r2 S5/r2 R2 R2
9 R3 r3 R3 R3

There are multiply defined entries in the above parsing table. These conflicts can be resolved by
considering associativity and precedence rules of arithmetic operators.
Action of state 7 on input + should be r1 since + is left-associative. Action of state 7 on input *
should be s5 since * has higher precedence than +.
Action of state 8 on + should be r2 since * has higher precedence than +. Action of state 8 on *
should be r2 since * is left-associative.

Implementation of LR parsing tables: Parsing table can be stored as a two-dimensional array. But
this takes a lot of space. Therefore, a linked list structure, which is slower but takes very little space
can be used as follows:
Action field encoding: Usually, many rows of the action table are identical. So we can create a
pointer for each state into a one-dimensional array. We can assign an integer value to each terminal
and this integer value can be used as offset to access entry for a pair of state and a terminal.
We can improve above structure, which consists of pointers to one-dimensional array by using
linked list. This will be slower but saves a lot of space. Here we create a list for the actions of each
state. The list consists of pairs of a terminal symbol and an action. The most frequent action for a
state is appended to the end of the list and in place of terminal we may use the notation “any”. This
means that if the current input symbol is not found so far on the list, we should do the action
specified in the pair for notation “any “. Also error entries can be safely replaced by reduce actions,
for uniformity in a row. The errors will be detected later, before a shift move.
Goto field encoding: Goto table can also be encoded by a list. But since usually there are very few
entries per state in goto table, we use a list of pairs for each nonterminal A as follows:
Goto[current state,A]=next state.
We can improve above structure further by taking advantage of the fact that error entries in the goto
table are never consulted. We can replace each error entry by the most common non-error entry in its
column.
Ex. Consider the parsing table in Ex.1
Actions for states 0,4,6, and 7 are same. So all of them can be represented by the following list:
Symbol Action
id s5
( s4
any error
List for state 1 :
+ s6
$ acc
any error
In state 2, error entries can be replaced by r2. List for state 2:
* s7
any r2
In state 3, error entries can be replaced by r4. List for state 3:
Any r4
States 5, 10, and 11 have entries (any,r6), (any,r3), and (any,r5) respectively.
List for state 8:
+ s6
) s11
any error
List for state 9:
* s7
any r1

Goto entry for column E may be:


Current_state next_state
4 8
any 1
Goto entry for column T may be:
6 9
any 2
Goto entry for column F may be:
7 10
any 3

Error recovery in LR parsing: LR parser detects an error when it comes across error entry in the parsing
table. Errors are never detected by consulting the goto table. A canonical LR parser never makes even a
single reduction before announcing an error. SLR and LALR parsers may make many reductions before
announcing an error, but they never shift an erroneous input symbol on the stack.
Panic-mode error recovery: After detecting an error, we scan down the stack until a state s with a goto
on a particular nonterminal A is found. Zero or more input symbols are then discarded until a symbol a
is found which is in FOLLOW(A). Then parser stacks the state goto[s,A] and continues normal parsing.
Nonterminal A is chosen in such a way that it represents major program pieces, such as expression,
statement, or block. For ex., if A is the nonterminal stmt, a might be semicolon or end.
Phrase-level recovery: In this method, each error entry in the LR parsing table is examined to guess most
likely programming error which gives rise to that error entry. Then a proper recovery procedure can be
written for that error entry. In this method top of the stack and/or first input symbols may be modified if
required by the error recovery procedure.

Error detection and recovery in Syntax analysis phase: Source program may contain following types of
errors:
1. Lexical error For ex. misspelling of an identifier, keyword, or operator.
2. Syntactic error For ex. arithmetic expression with unbalanced parenthesis
3. Semantic error For ex. operator applied on incompatible operands.
4. Logical error For ex. infinitely recursive call.
Most of the errors in compilation are detected by syntax analysis phase.
Error recovery strategies: Some syntax error recovery strategies are as follows:
1. Panic mode recovery:
2. Phrase level recovery:
3. Error productions: If we can guess commonly occurring errors in the program, then grammar for
the programming language can be augmented with the error productions. Now this augmented
grammar can be used to construct a parser. If parser uses an error production, then appropriate
error recovery can be used.
4. Global correction: These methods use algorithms, which find a minimal sequence of changes in
the program. Using these algorithms we get a globally least-cost correction. These methods are
used rarely because their implementation is costly in terms of time and space.

Parser generator-Yacc : Yacc ( Yet another compiler-compiler ) is the LALR parser generator.

Yacc program Yacc compiler parser in C

Parser in C C compiler parser

Input parser output

A Yacc program has three parts:


Declarations
%%
translation rules
%%
supporting C-routines

Declarations part contains two sections: First section contains C declarations and second section
contains grammar token declarations.
Translation rules part contains grammar productions and associated semantic actions.
Supporting C-routines part contains necessary supporting C-routines. Lexical analyzer by the name
yylex() is provided. Other procedures such as error recovery routines are added as necessary.
Using Yacc with ambiguous grammars: Yacc resolves all parsing action conflicts using following two
rules:
1. A reduce-reduce conflict is resolved by choosing the conflicting production listed first in the
Yacc program.
2. A shift-reduce conflict is resolved in favor of shift.
Since above rules may not be suitable for each compiler writer, a general mechanism to resolve shift-
reduce conflicts is also provided. Precedence and associativity of terminals can be specified in the
declarations part.

You might also like