Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

1

Syntax Analysis
Part II
Chapter 4

COP5621 Compiler Construction


Copyright Robert van Engelen, Florida State University, 2007-2013
2

Bottom-Up Parsing
• LR (k) methods (Left-to-right, Rightmost
derivation, k is lookahead symbol, default is
1)
– SLR, Canonical LR, LALR
• Other special cases:
– Shift-reduce parsing
– Operator-precedence parsing
3

Shift-Reduce Parsing
Grammar: Reducing a sentence: Shift-reduce corresponds
SaABe abbcde to a rightmost derivation:
AAbc|b aAbcde S rm a A B e
Bd aAde rm a A d e
aABe rm a A b c d e
These match S rm a b b c d e
production’s
right-hand sides
S

A A A
A A A B A B
a b b c d e a b b c d e a b b c d e a b b c d e
4

Handles
A handle is a substring of grammar symbols in a
right-sentential form that matches a right-hand side
of a production
Grammar: abbcde
SaABe aAbcde
AAbc|b aAde Handle
Bd aABe
S
abbcde
aAbcde NOT a handle, because
aAAe further reductions will fail
…? (result is not a sentential form)
5

Stack Implementation of
Shift-Reduce Parsing
Stack Input Action
$ id+id*id$ shift
$id +id*id$ reduce E  id How to
Grammar: $E +id*id$ shift
$E+ id*id$ shift
resolve
EE+E conflicts?
$E+id *id$ reduce E  id
EE*E $E+E *id$ shift (or reduce?)
E(E) $E+E* id$ shift
E  id $E+E*id $ reduce E  id
$E+E*E $ reduce E  E * E
$E+E $ reduce E  E + E
Found handles $E $ accept
to reduce
6

Conflicts
• Shift-reduce and reduce-reduce conflicts are
caused by
– The limitations of the LR parsing method (even
when the grammar is unambiguous)
– Ambiguity of the grammar
7

Model of an LR Parser
input a1 a2 … ai … an $

stack
LR Parsing Program
sm (driver) output
Xm
sm-1
Xm-1 action goto Constructed with
… LR(0) method,
shift DFA SLR method,
s0 reduce LR(1) method, or
accept LALR(1) method
error
8

LR (0) Parser
LR(0) parser don't do any lookahead (that is, they look ahead zero
symbols) before deciding which reduction to perform.
9

Building LR(0) Parsing Table


For each edge (X: (I, J))
• if X is terminal, put shift J at (I, X)
• if X is non-terminal, put goto J at (I, X)
• if I contains S’ → S.$ , put accept at (I, $)
• if I contains Aàα . where Aàα . has grammar rule
number n for each terminal x, put reduce reduce n
at (I, x)
10
11

LR(0) Parsing Table


12

Example ((a),b)
13

LR(0) Limitations
14

Shift-Reduce Parsing:
Shift-Reduce Conflicts
Stack Input Action
$… …$ …
$…if E then S else…$ shift or reduce?
Ambiguous grammar:
S  if E then S
| if E then S else S
| other

Resolve in favor
of shift, so else
matches closest if
15

Shift-Reduce Parsing:
Reduce-Reduce Conflicts
Stack Input Action
$ aa$ shift
$a a$ reduce A  a or B  a ?
Grammar:
CAB
Aa
Ba

Resolve in favor
of reducing A  a,
otherwise we’re stuck!
16

LR(k) Parsers: Use a DFA for


Shift/Reduce Decisions
1
C 4
B
start A
0 2 State I1:
a State I4:
a 5 S  C• C  A B•
goto(I0,C)
3
Grammar: goto(I2,B)
State I0:
SC State I2:
S  •C goto(I 0 ,A)
CAB C  A•B
C  •A B
Aa B  •a goto(I2,a)
A  •a
Ba
Can only goto(I0,a) State I5:
State I3:
reduce A  a B  a•
A  a•
(not B  a)
17

DFA for Shift/Reduce Decisions


The states of the DFA are used to determine
Grammar: if a handle is on top of the stack
SC
CAB Stack Input Action
$0 aa$ start in state 0
Aa $0 aa$ shift (and goto state 3)
Ba $0a3 a$ reduce A  a (goto 2)
$0A2 a$ shift (goto 5)
State I0: goto(I0,a) $0A2a5 $ reduce B  a (goto 4)
S  •C $0A2B4 $ reduce C  AB (goto 1)
State I3: $0C1 $ accept (S  C)
C  •A B A  a•
A  •a
18

DFA for Shift/Reduce Decisions


The states of the DFA are used to determine
Grammar: if a handle is on top of the stack
SC
CAB Stack Input Action
$0 aa$ start in state 0
Aa $0 aa$ shift (and goto state 3)
Ba $0a3 a$ reduce A  a (goto 2)
$0A2 a$ shift (goto 5)
$0A2a5 $ reduce B  a (goto 4)
State I0: goto(I0,A) $0A2B4 $ reduce C  AB (goto 1)
S  •C State I2: $0C1 $ accept (S  C)
C  •A B C  A•B
A  •a B  •a
19

DFA for Shift/Reduce Decisions


The states of the DFA are used to determine
Grammar: if a handle is on top of the stack
SC
CAB Stack Input Action
$0 aa$ start in state 0
Aa $0 aa$ shift (and goto state 3)
Ba $0a3 a$ reduce A  a (goto 2)
$0A2 a$ shift (goto 5)
$0A2a5 $ reduce B  a (goto 4)
$0A2B4 $ reduce C  AB (goto 1)
goto(I2,a) $0C1 $ accept (S  C)
State I2:
C  A•B State I5:
B  •a B  a•
20

DFA for Shift/Reduce Decisions


The states of the DFA are used to determine
Grammar: if a handle is on top of the stack
SC
CAB Stack Input Action
$0 aa$ start in state 0
Aa $0 aa$ shift (and goto state 3)
Ba $0a3 a$ reduce A  a (goto 2)
$0A2 a$ shift (goto 5)
$0A2a5 $ reduce B  a (goto 4)
$0A2B4 $ reduce C  AB (goto 1)
goto(I2,B) $0C1 $ accept (S  C)
State I2:
C  A•B State I4:
B  •a C  A B•
21

DFA for Shift/Reduce Decisions


The states of the DFA are used to determine
Grammar: if a handle is on top of the stack
SC
CAB Stack Input Action
$0 aa$ start in state 0
Aa $0 aa$ shift (and goto state 3)
Ba $0a3 a$ reduce A  a (goto 2)
$0A2 a$ shift (goto 5)
$0A2a5 $ reduce B  a (goto 4)
$0A2B4 $ reduce C  AB (goto 1)
goto(I0,C) $0C1 $ accept (S  C)
State I0:
S  •C State I1:
C  •A B S  C•
A  •a
22

DFA for Shift/Reduce Decisions


The states of the DFA are used to determine
Grammar: if a handle is on top of the stack
SC
CAB Stack Input Action
$0 aa$ start in state 0
Aa $0 aa$ shift (and goto state 3)
Ba $0a3 a$ reduce A  a (goto 2)
$0A2 a$ shift (goto 5)
$0A2a5 $ reduce B  a (goto 4)
$0A2B4 $ reduce C  AB (goto 1)
goto(I0,C) $0C1 $ accept (S  C)
State I0:
S  •C State I1:
C  •A B S  C•
A  •a
23

Example LR(0) Parsing Table


State I0: State I1: State I2: State I3: State I4: State I5:
C’  •C C’  C• C  A•B A  a• C  A B• B  a•
C  •A B B  •a
A  •a
action goto
Shift & goto 3 state a $ C A B
1 0 s3 1 2
Grammar:
C 4 1 acc
B 1. C’  C
start A 2 s5 4
0 2 2. C  A B
a 3 r3 r3 3. A  a
a 5
4 r2 r2 4. B  a
3 Reduce by 5 r4 r4
production #2
SLR Grammars 24

• A Simple LR parser or SLR parser is an LR


parser for which the parsing tables are generated
as for an LR(0) parser except that it only performs
a reduction with a grammar rule A →  if the next
symbol on the input stream is in the follow set of
A
• SLR eliminates some conflicts by populating the
parsing table with reductions A on symbols in
FOLLOW(A)
Shift on +
State I2:
State I0:
S  •E goto(I0,id) E  id•+ E goto(I3,+)
SE
E  id•
E  id + E E  •id + E
E  id E  •id FOLLOW(E)={$}
thus reduce on $
25

SLR Parsing Table


• Reductions do not fill entire rows
• Otherwise the same as LR(0)
id + $ E
0 s2 1
1. S  E 1 acc
2. E  id + E 2 s3 r3
3. E  id
3 s2 4
4 r2
Shift on +
FOLLOW(E)={$}
thus reduce on $
26

SLR Parsing

SLR parsing is LR(0) parsing, but with a


different reduce rule:
For each edge (X: (I, J))
if X is terminal, put shift J at (I, X)
if I contains A→α . where A → α . has rule
number n
for each terminal x in Follow(A), put
reduce reduce n at (I, x)
27
28
29
30
31
32
33
34
35
36

Shift-Reduce Conflict
• The simple improvement that SLR(1)
makes on the basic LR(0) parser is to
reduce only if the next input token is a
member of the follow set of the nonterminal
being reduced.
• Shift-Reduce conflict can be reduced using
SLR parsing, because Follow is used for
reduce operation in SLR parsing.
37

Limitation of SLR Parsing


• Grammar like

has a conflict when

• occurs either shift “=“ or reduce to L. So SLR is


not much powerful to remember enough left
context to decide what action the parser should
take on input =,having seen a string reducible to L.
• The LR(1) or LALR will be more powerful for a
larger collection of grammar.
SLR parsing
• Before we can build the parsing table, we need to compute the
FOLLOW sets:

S' S FOLLOW(S') = {$}


S  L=R FOLLOW(S) = {$}
S R FOLLOW(L) = {$, =}
L  *R FOLLOW(R) = {$, =}
L  id
R L
SLR parsing
state action goto
id = * $ S L R
0 s3 s5 1 2 4
1 accept
2 s6/r(RL)
3 r(Lid) r(Lid)
4 r(SR)
5 s3 s5 7 8
6 s3 s5 7 9
7 r(RL) r(RL)
8 r(L*R) r(L*R)
9 r(SL=R)

Note the shift/reduce conflict on state 2 when the lookahead is an =


LR(1) or Canonical LR Parser
40

LR(1) item sets are more discriminating:


A look-ahead set is kept with each separate
item, to be used to resolve conflicts when a
reduce item has been reached. This greatly
increases the strength of the parser, but also the
size of its tables.
• The method for building the collection of sets
of valid LR(1) items is essentially the same as
the one for building the canonical collection of
sets of LR(0) items. We need only to modify
the two procedures CLOSURE and GOTO.
Canonical LR(1) parsing
• In the beginning, all we know is that we have not
read any input (S'S), we hope to parse an S and
after that we should expect to see a $ as
lookahead. We write this as: S'S, $
• Now, consider a general item A, x. It
means that we have parsed an , we hope to parse
 and after those we should expect an x. Recall
that if there is a production , we should add
 to the state. What kind of lookahead should
we expect to see after we have parsed ?
– We should expect to see whatever starts a . If  is
empty or can vanish, then we should expect to see an x
after we have parsed  (and reduced it to B)
LR(1) Items
• An LR(1) item
[A•, a]
contains a lookahead terminal a, meaning 
already on top of the stack, expect to see a
• For items of the form
[A•, a]
the lookahead a is used to reduce A only if
the next input is a
• For items of the form
[A•, a]
with  the lookahead has no effect
The Closure Operation for LR(1)
Items
• Start with closure(I) = I
• If [A•B, a]  closure(I) then
for each production B in the grammar
and each terminal b  FIRST(a)
add the item [B•, b] to I
if not already in I
• Repeat 2 until no new items can be added
The Goto Operation for LR(1)
Items
• For each item [A•X, a]  I, add the set
of items closure({[AX•, a]}) to goto(I,X)
if not already there
• Repeat step 1 until no more items can be
added to goto(I,X)
Canonical LR(1) parsing
I1 I9
S
I6 S L=  R, $ R SL=R, $
I0 S' S, $ S' S , $
S   L=R, $ R   L, $
S   R, $ L L   *R, $ id
S  L =R, $ = Lid, $ I3'
L   *R, =/$ L   id, $
I2 R  L , $
L   id, =/$ * L
R L, $ I7'
R   L, $ *
L *R, =/$ L *R, $
L id
id R I5 R  L, =/$ I5' R  L, $ I3'
L  id, =/$ L  id, $
I3 L  id , =/$ L *R , $
L  *R, $ R
id L  *R, =/$
I8'
L *
* R
I4 S  R, =/$

I8 L *R , =/$
R L, =/$ I7
Example LR(1) Parsing Table
id * = $ S L R
0 s3 s5 1 2 4
1 acc
Grammar:
2 s6 r6
1. S’  S
2. S  L = R 3 r5 r5
3. S  R 4 r3 r3
4. L  * R 5 s3 s5 7 8
5. L  id 6 s3' s5' 7’ 9
6. R  L 7 r6 r6
8 r4 r4
9 r2
3’ r5
5’ s3' s5' 7’ 8’
7’ r6
8’ r4
47

LR(1) Limitations
• A LR(1) grammar is one where the construction
of an LR(1) parse table does not require two
action (shift-reduce or reduce-reduce) in any
one cell.
• Many conflicts in SLR(1) parse tables are
avoided if the LR(1) parse approach is used,
because the latter approach is more restrictive
on where it allows reduce operations. An
SLR(1) parse table may allow reduces where
the next input token should not allow such.
LALR(1) parsing
• This is the result of an effort to reduce the
number of states in an LR(1) parser.
• We notice that some states in our LR(1)
automaton have the same core items and differ
only in the possible lookahead information.
Furthermore, their transitions are similar.
– States I3 and I3', I5 and I5', I7 and I7', I8 and I8'
• We shrink our parser by merging such states.
• SLR : 10 states, LR(1): 14 states, LALR(1) : 10 states
LALR(1) parsing
I1 I9
S
I6 S L=  R, $ R SL=R, $
I0 S' S, $ S' S , $
S   L=R, $ R   L, $
S   R, $ L L   *R, $ id I
S  L =R, $ = L   id, $
3
L   *R, =/$
I2 R  L , $
L   id, =/$ * L
R   L, $ *
L *R, =/$
id R I5 R  L, =/$
L  id, =/$
I3 L  id , =/$ R L, =/$ I7
id L  *R, =/$ L

* R
I4 S  R, =/$

I8 L *R , =/$
Example LR(1) Parsing Table

id * = $ S L R
0 s3 s5 1 2 4
1 acc
Grammar:
2 s6 r6
1. S’  S
3
2. S  L = R r5 r5
3. S  R 4 r3 r3
4. L  * R 5 s3 s5 7 8
5. L  id 6 s3 s5 7 9
6. R  L 7 r6 r6
8 r4 r4
9 r2
Conflicts in LALR(1) parsing
• Note that the conflict that had vanished
when we created the LR(1) parser has not
reappeared.
• Can LALR(1) parsers introduce conflicts
that did not exist in the LR(1) parser?
• Unfortunately YES.
• BUT, only reduce/reduce conflicts.
Conflicts in LALR(1) parsing
• LALR(1) parsers cannot introduce shift/reduce conflicts.
– Such conflicts are caused when a lookahead is the
same as a token on which we can shift. They depend
on the core of the item. But we only merge states that
had the same core to begin with. The only way for an
LALR(1) parser to have a shift/reduce conflict is if
one existed already in the LR(1) parser.
• LALR(1) parsers can introduce reduce/reduce conflicts.
– Here's a situation when this might happen:
A  B , x A B,y A  B  , x/y
merges with to give:
A  C , y A  C , x A  C , x/y
53

Example
54
55
56
57
Example The grammar G
S’ → S
S →CC
C →cC | d
• Let I = { (S’ → •S, $) }
• I0 = closure(I) = {
S’ → •S, $
S → • C C, $
C → •c C, c/d
C → •d, c/d
}
• goto(I0, S) = closure( {S’ → S •, $ } )
= {S’ → S •, $ } = I1
Exercise
The grammar G
S’ → S
S →CC
• Let I = { (S → C •C, $) } C →cC | d

• I2 = closure(I) = ?
• I3 = goto(I2, c) = ?
LR(1) Automation
Example
ACTION GOTO The grammar G
state
c d $ S C S’ → S
0 s3 s4 1 2 S →CC
1 acc C →cC | d
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2
LL, SLR, LR, LALR Summary
• LL parse tables computed using FIRST/FOLLOW
– Nonterminals  terminals  productions
– Computed using FIRST/FOLLOW
• LR parsing tables computed using closure/goto
– LR states  terminals  shift/reduce actions
– LR states  nonterminals  goto state transitions
• A grammar is
– LL(1) if its LL(1) parse table has no conflicts
– SLR if its SLR parse table has no conflicts
– LR(1) if its LR(1) parse table has no conflicts
– LALR(1) if its LALR(1) parse table has no conflicts
63

Classification of Grammars
YACC
yacc
specification Yacc or Bison y.tab.c
yacc.y compiler

y.tab.c C a.out
compiler

input output
stream a.out stream

You might also like