Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

Kashif Sharif

School of Computer Science & Technology


 Parser: The big picture
 Grammar introduction
 Top-down parsing
 Bottom-up parsing
 Yacc/Bison/ANTLR

Slides adapted from Perkins@UW, Krolic@McGill, Weixing@BIT


 Idea: Apply productions in reverse to
convert the program into a start symbol.

 Directional and Predictive bottom-up


parsing
 Directional: Scan the input from left to right.
 Predictive: Guess which production should be
inverted.
 Need to understand the process of reduction
 Derivation in reverse

 The key decision is


 When to reduce?
 What production to apply?

We fist look at the process of reduction, and


then learn the decision making rules.
One view of bottom-up parsing
A second view of bottom-up parsing
 The second view of bottom-up parsing
 Note the right most derivation
A third view of bottom-up parsing
A third view of bottom-up parsing
A third view of bottom-up parsing
A third view of bottom-up parsing
A third view of bottom-up parsing
A third view of bottom-up parsing
A third view of bottom-up parsing
A third view of bottom-up parse
A third view of bottom-up parsing
A third view of bottom-up parsing
 A left-to-right, bottom-up parse is a
rightmost derivation in reverse.

 Works by iteratively searching for a handle,


then reducing the handle

 Handle: A leftmost complete cluster of leaf


nodes of a parser tree
A leftmost reduction isn't always the handle
 All of the reductions we applied were to the far right end of
the left area.
 Consequently, shift/reduce parsing means
 Shift: Move a terminal from the right to the left area.
 Reduce: Replace some number of symbols at the right side of
the left area

 All activity in a shift/reduce parser is at the far right end of the


left area.
 Idea: Represent the left area as a stack.
 Shift: Push the next terminal onto the stack
 Reduce: Pop some number of symbols from the stack, then push
the appropriate nonterminal.
 Handles
Basic Example of Items

S  A | B A  aA | b | Bc

S  •A S  A•
S  •B S  B•
A  •aA A  a•A A  aA•
A  •b S  b•
A•
B  •c S  c•
G(S): S→A|B A→aA|b|ε B→c

S→·A S→A·
S→·B S→B·
A→·aA A→a·A A→aA·
A→·b A→b·
A→·
B→·c B→c·
Simplified Example:

S’→A A→aA | b

2 a 3 A 7
A→·aA A→a·A A→aA·

1 A 6
S  → ·A S  →A ·

4 b 5
A→ ·b A→b ·

NFA
NFA to DFA
a
A
2 5
a
1 A 3 b
b
4
Example

Input: a b $

1 $
Input: a b $

2 a
1 $
Input: a b $

3 b
2 a
1 $
Input: a b $

A
2 a
1 $
Input: a b $

4 A
2 a
1 $
Input: a b $

A
1 $
Input: a b $

5 A
1 $
Input: a b $

1 $
Directly build the DFA (using Closure & GOTO)
Things to remember, so far

 LR(0) : No look ahead. i.e. k=0


 Also referred to as Simpler LR or SLR

 Augmented grammar of G is G’, where S’


is defined as a production for start point.

 Better LR parsers can be built


Closure of Item Sets

Kernel and Non-Kernel Items


GOTO Function – GOTO(I, X)
 I is a set of items
 X is a grammar symbol

 Transition function, for the state of I under


input X.
 Rule:
 Aα.Bβ and Bγ, then B. γ

 Canonical Collection
 C={Closure({[E’.E]})}

 DFA
 States are Sets of items from
canonical collection
 Transitions are GOTO(I,X)
 Note: Each state is
associated to a specific
symbol.
id * id
Table construction
 Productions are numbered
 si means shift and stack had state i.
 rj means reduce using the production number j.
 acc is accept.
 Blank means error.
 Given a grammar G, augment G to produce G’. i.e. S’E
 For G’, construct C the canonical collection and GOTO functions.
 Find Follow(A) for each non-terminal A in G’.
Items in I0
F.(E)

F.id

Rest of the items don’t have actions.


Items in I1
E’E.

EE.+T
Items in I2
ET. Follow(E)={ }

TT.*F
 Remember:
 LR(0)
 The simplest LR parser
 Covers more grammars than
LL, but still has issues.

 Shift/Reduce Conflict
 Reduce/Reduce Conflict
 Shift/Reduce Conflict
 A shift-reduce conflict occurs in a state that
requests both a shift action and a reduce action.
 Shift/Reduce Conflict - Solutions
 Fix the grammar

 Use a parse tool with a “longest match” rule i.e., if


there is a conflict, choose to shift instead of
reduce

 Guideline: a few shift-reduce conflicts are fine, but


be sure they do what you want (and that this
behavior is guaranteed by the tool specification)
 Reduce/Reduce Conflict
 A reduce-reduce conflict occurs in a state that
requests two or more different reduce actions.
 Reduce/Reduce Conflict - Solutions
 These normally indicate a serious problem with
the grammar.

 Use a different kind of parser generator that takes


look ahead information into account when
constructing the states. LR(k) .
 Most practical tools (Yacc, Bison, CUP, etc.) do
this

 Fix the grammar


 A more powerful way of building LR
parsers is with the look ahead.
 Similar concept as that of LL(1) parser

 Canonical-LR: Makes full use of look


ahead symbol(s). Usually LR(1) and has a
larger set of items.
 Look ahead-LR: LALR based on LR(0)
items but with look ahead integrated in
them.

You might also like