Professional Documents
Culture Documents
Parsing 1
Parsing 1
412
FALL 2017
Introduc)on to Parsing
Context-free grammars
Comp 412
Finally, the box moved …
source IR … IR target
code code
Front End OpMmizer Back End
Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these
materials for their personal use.
Faculty from other educaMonal insMtuMons may use these materials for nonprofit educaMonal purposes,
provided this copyright noMce is preserved.
Chapter 3 in EaC2e
The Front End
source IR IR target
code code
Front End OpMmizer Back End
Each producMon has a single nonterminal symbol on its leB hand side,
which makes it impossible to encode either leh or right context.
⇒ The grammar is context free
Bo[om up
0 SheepNoise baa
• A boZom-up parse starts with the
0 SheepNoise baa baa words in the sentence and works
1 baa baa baa towards the start symbol
Three-word SheepNoise
In the general case1, discovering a deriva)on looks expensive
Start
Rule Senten'al Form
— Start
0 Brackets Brackets
1 ( Brackets )
2 ( [ Brackets ] )
3 ( [ ( ) ] ) ( Brackets )
Start ⇒* ( [ ( ) ] )
[ Brackets ]
The derivaMon gives us the grammaMcal
structure of the input sentence, which
was completely missing in RE / DFA ( )
recognizers.
Parse tree for this deriva'on
COMP 412, Fall 2017 11
A Simple Expression Grammar
CFGs are used to define many programming language constructs
Deriva'on of x – 2 * y
When a syntacMc category has just one
lexeme, as with plus and minus , we will
And, if you skipped class & are reading the
ohen write it as just the lexeme. slides, you should know that this grammar
COMP 412, Fall 2017 is a very bad way to define expressions 12
A Simple Expression Grammar
Construc)ng a deriva)on
• At each step, we select an NT in the current string to replace
• Different choices can lead to different derivaMons
Expr Expr
Any grammar that has mulMple lehmost (or mulMple rightmost) derivaMons
for a single sentence is an ambiguous grammar.
Classic exam quesMon
⇒ Ambiguity is bad in a programming language
COMP 412, Fall 2017 20
Ambiguity
What should you do with an ambiguous grammar?
You rewrite it to remove the ambiguity.
0 Expr ➝ Expr Op Value
0 Expr ➝ Expr Op Expr 1 | Value
1 | number 2 Value ➝ number
2 | idenMfier 3 | idenMfier
3 Op ➝ plus 4 Op ➝ plus
4 | minus 5 | minus
5 | Mmes 6 | Mmes
6 | divide 7 I divide
Ambiguous Grammar RewriNen Grammar
In this case, the ambiguity that we see arises from the fact that rule 0
generates Expr, its lhs, at both the right & leh ends of its rhs.
⇒ The ambiguity allows deriva@ons with the wrong evalua@on order
COMP 412, Fall 2017 21
Ambiguity
Le\most deriva)on of x – 2 * y
0 Expr ➝ Expr Op Value Rule Senten'al Form
1 | Value — Expr
2 Value ➝ number 0 Expr Op Value
3 | idenMfier 0 Expr Op Value Op Value
4 Op ➝ plus 1 Value Op Value Op Value
5 | minus 3 <id,x> Op Value Op Value
6 | Mmes 5 <id,x> — Value Op Value
7 I divide 2 <id,x> — <num,2> Op Value
6 <id,x> — <num,2> * Value
The unambiguous grammar requires
one more step in this derivaMon: the 3 <id,x> — <num,2> * <id,y>
rewrite through Value for x.
Seems like a small price to pay. (TANSTAAFL)
COMP 412, Fall 2017 22
Ambiguity
Defini)ons
• A context-free grammar G is ambiguous if there exists has more than
one lehmost derivaMon for some sentence in L(G)
• A context-free grammar G is ambiguous if there exists has more than
one rightmost derivaMon for some sentence in L(G)
• A context-free grammar G is ambiguous if the rightmost and lehmost
derivaMons produce different parse trees
– However, the rightmost and leBmost deriva@ons may differ
if if
if S2 if
S1 S1 S2
produc@on 2, then produc@on 1 produc@on 1, then produc@on 2
With this grammar, the example has only one rightmost derivaMon
The new grammar has only one rightmost derivaMon for the example
Resolving ambiguity
• To remove context-free ambiguity, rewrite the grammar
• To handle context-sensiMve ambiguity takes cooperaMon
– Knowledge of declaraMons, types, …
– Accept a superset of L(G) & check it by other means†
– This is a language design problem
†See Chapter 4
COMP 412, Fall 2017 29