Professional Documents
Culture Documents
Compiler Construction 1 1 Compiler Construction 1 2
Compiler Construction 1 1 Compiler Construction 1 2
Recall Goal:
For a grammar G, with start symbol S, any string α Given an input string w and a grammar G, construct
such that S ⇒∗ α is called a sentential form a parse tree by starting at the leaves and working to
the root.
• If α ∈ Vt∗ , then α is called a sentence in L(G)
The parser repeatedly matches a right-sentential form
• Otherwise it is just a sentential form (not a from the language against the tree’s upper frontier.
sentence in L(G))
At each match, it applies a reduction to build on
the frontier:
A left-sentential form is a sentential form that occurs
in the leftmost derivation of some sentence.
• each reduction matches an upper frontier of the
A right-sentential form is a sentential form that partially built tree to the RHS of some production
occurs in the rightmost derivation of some sentence.
• each reduction adds a node on top of the frontier
Example Handles
S Theorem:
β w 4. ⇒ a unique handle A → β
Example Handle-pruning
The left-recursive expression grammar (original form) The process to construct a bottom-up parse is called
handle-pruning.
1 <goal> ::= <expr>
2 <expr> ::= <expr> + <term>
To construct a rightmost derivation
3 | <expr> - <term>
4 | <term> S = γ0 ⇒ γ1 ⇒ γ2 ⇒ ... ⇒ γn
5 <term> ::= <term> * <factor>
6 | <term> / <factor> we set i to n and apply the following simple algorithm
7 | <factor>
8 <factor> ::= num for i = n downto 1
9 | id 1. find the handle Ai → βi in γi
2. replace βi with Ai to generate γi−1
Prod’n. Sentential Form
This takes 2n steps, where n is the length of the
– <goal>
derivation
1 <expr>
3 <expr> - <term>
5 <expr> - <term> * <factor>
9 <expr> - <term> * id
7 <expr> - <factor> * id
8 <expr> - num * id
4 <term> - num * id
7 <factor> - num * id
9 id - num * id
LL(k): recognize use of a production A → β An LR(1) parser for either Algol or Pascal has several
seeing first k symbols of β thousand states, while an SLR(1) or LALR(1) parser
LR(k): recognize occurrence of β (the handle) for the same language may have several hundred
having seen all of what is derived from β plus k states.
symbols of lookahead
The table construction algorithms use sets of LR(k) The • indicates how much of an item we have seen
items or configurations to represent the possible at a given state in the parse:
states in a parse.
[A → •XY Z] indicates that the parser is looking
An LR(k) item is a pair [α, β], where for a string that can be derived from XY Z
α is a production from G with a • at some position [A → XY • Z] indicates that the parser has seen
in the RHS, marking how much of the RHS of a a string derived from XY and is looking for one
production has already been seen derivable from Z
Let I be a set of LR(0) items and X be a grammar We start the construction with the item [S → •S$],
symbol. where
Then, GOTO(I, X) is the closure of the set of all S is the start symbol of the augmented grammar G
items
S is the start symbol of G
[A → αX • β] such that [A → α • Xβ] ∈ I
If I is the set of valid items for some viable prefix $ represents EOF
γ, then GOTO(I, X) is the set of valid items for the
viable prefix γX. To compute the collection of sets of LR(0) items
function items(G )
GOTO(I, X) represents state after recognizing X s0 = closure0({[S → • S$]})
in state I. S = {s0}
function goto0(I, X) repeat
let J be the set of items [A → α X • β] for each set of items s ∈ S
such that [A → α • X β] ∈ I for each grammar symbol X
return closure0(J) if goto0(s, X) = ∅ and goto0(s, X) ∈
/ S
add goto0(s, X) to S
until no more item sets can be added to S
return S
E id ( E
+ +
1 3 7
$ T )
2 4 8
1
+
3
+
7
shift-reduce : both shift and reduce possible in
same item set
$ T )
1 S → E$ ( T
3 | T E id ( E
| (E) $ T )
5
2 4 8
1. construct the collection of sets of LR(0) items for
G
FOLLOW(E) = FOLLOW(T ) = {$,+,)}
2. state i of the CFSM is constructed from Ii
(a) [A → α • aβ] ∈ Ii and goto0(Ii, a) = Ij state ACTION GOTO
⇒ ACTION[i, a] = “shift j”, ∀a = $ id ( ) + $ S E T
(b) [A → α•] ∈ Ii, A = S
0 s5 s6 – – – – 1 9
⇒ ACTION[i, a] = “reduce A → α”,
1 – – – s3 acc – – –
∀a ∈ FOLLOW(A)
2 – – – – – – – –
(c) [S → S • $] ∈ Ii 3 s5 s6 – – – – – 4
⇒ ACTION[i, $] = “accept” 4 – – r2 r2 r2 – – –
3. goto0(Ii, A) = Ij 5 – – r4 r4 r4 – – –
⇒ GOTO[i, A] = j 6 s5 s6 – – – – 7 9
4. set undefined entries in ACTION and GOTO to 7 – – s8 s3 – – – –
“error” 8 – – r5 r5 r5 – – –
5. initial state of parser s0 is closure0([S → •S$]) 9 – – r3 r3 r3 – – –
Given an item [A → α • Bβ, a], its closure contains Let I be a set of LR(1) items and X be a grammar
the item and any other items that can generate legal symbol.
substrings to follow α.
Then, GOTO(I, X) is the closure of the set of all
Thus, if the parser has viable prefix α on its stack, items
the input should reduce to Bβ (or γ for some other
item [B → •γ, b] in the closure). [A → αX • β, a] such that [A → α • Xβ, a] ∈ I
function closure1(I) If I is the set of valid items for some viable prefix
repeat γ, then GOTO(I, X) is the set of valid items for the
if [A → α • Bβ, a] ∈ I viable prefix γX.
add [B → •γ, b] to I, where b ∈FIRST(β a)
until no more items can be added to I GOTO(I, X) represents state after recognizing X
return I in state I.
function goto1(I, X)
let J be the set of items
[A → α X•β, a] such that [A → α• Xβ, a] ∈ I
return closure1(J)
Building the LR(1) item sets for Constructing the LR(1) parsing table
grammar G
Build lookahead into the DFA to begin with
We start the construction with the item [S → •S, $],
where 1. construct the collection of sets of LR(1) items for
G
S is the start symbol of the augmented grammar G
2. state i of the LR(1) machine is constructed from
S is the start symbol of G Ii
(a) [A → α • aβ, b] ∈ Ii and goto1(Ii, a) = Ij
$ represents EOF
⇒ ACTION[i, a] = “shift j”
(b) [A → α•, a] ∈ Ii, A = S
To compute the collection of sets of LR(1) items
⇒ ACTION[i, a] = “reduce A → α”
function items(G ) (c) [S → S•, $] ∈ Ii
s0 = closure1({[S → • S, $]) ⇒ ACTION[i, $] = “accept”
S = {s0}
repeat 3. goto1(Ii, A) = Ij
for each set of items s ∈ S ⇒ GOTO[i, A] = j
for each grammar symbol X
if goto1(s, X) = ∅ and goto1(s, X)∈
/ S 4. set undefined entries in ACTION and GOTO to
add goto1(s, X) to S “error”
until no more item sets can be added to S
return S 5. initial state of parser s0 is closure1([S → •S, $])
To construct LALR(1) parsing tables, we can insert The revised (and renumbered) algorithm
a single step into the LR(1) algorithm
1. construct the collection of sets of LR(1) items for
(1.5) For each core present among the set of LR(1) G
items, find all sets having that core and replace 2. for each core present among the set of LR(1)
these sets by their union. items, find all sets having that core and replace
these sets by their union. (Update the goto
The goto function must be updated to reflect the function incrementally)
replacement sets. 3. state i of the LALR(1) machine is constructed
from Ii
The resulting algorithm has large space requirements. (a) [A → α • aβ, b] ∈ Ii and goto1(Ii, a) = Ij
⇒ ACTION[i, a] = “shift j”
(b) [A → α•, a] ∈ Ii, A = S
⇒ ACTION[i, a] = “reduce A → α”
(c) [S → S•, $] ∈ Ii
⇒ ACTION[i, $] = “accept”
4. goto1(Ii, A) = Ij
⇒ GOTO[i, A] = j
5. set undefined entries in ACTION and GOTO to
“error”
6. initial state of parser s0 is closure1([S → •S, $])
• bad pieces of tree hanging from stack • requires more stack space
Left Recursion:
We want to parse the rest of the file
Rule of thumb:
• print an informative message (including line
number)
• right recursion for top-down parsers
Parsing review
Recursive descent
LL(k)
LR(k)
Compiler Construction 1 47