Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

LuA180: Compller ConsLrucuon

1op-down parslng
Crel Pedln
8evlsed: 2013-01-30a
Lexlcal analysls
(scannlng)
SynLacuc analysls
(parslng)
Semanuc analysls
lnLermedlaLe
code generauon
Cpumlzauon
Machlne code
generauon
source
code
machlne
code
Lokens
AS1
AurlbuLed
AS1
lnLermedlaLe
code
lnLermedlaLe
code
Compller phases and program represenLauons
!"#$%&'& )%"*+,&'&
2
Lokens
AS1
A closer look aL Lhe parser
ure parslng
AS1 bulldlng
concreLe parse Lree
(lmpllclL)
arser
3
Scanner
LexL
-,./$#- ,01-,&&'2"&
32"*,0*45-,, .-#66#-
#7&*-#3* .-#66#-
8,9",: 7%;
!"#$ &'()*+'
ulerenL parslng algorlLhms
4
Amblguous
unamblguous
All conLexL-free grammars
,-
,,
LL:
Left-to-right scan
Leftmost derivation
Builds tree top-down
Simple to understand
LR:
Left-to-right scan
Rightmost derivation
Builds tree bottom-up
More powerful
!"#$ &'()*+'
LL and L8 parsers, maln ldea
3
if ID then ID = ID ; ID ...
LR(1): decides to build Assign
after seeing the rst token
following its subtree.
The tree is built bottom up.
Id Assign
Id Id
The token is called lookahead.
LL(k) and LR(k) use k lookahead tokens.
if ID then ID = ID ; ID ...
IfStmt
Id Assign
LL(1): decides to build Assign
after seeing the rst token of
its subtree.
The tree is built top down.
CompoundStmt
8ecurslve-descenL parslng
A way of programmlng an LL(1) parser by recurslve meLhod calls
6
Assume an L8nl grammar wlLh exacLly 2",
producuon rule for each nonLermlnal symbol.
lor each nonLermlnal, a meLhod ls consLrucLed.
A nonLermlnal meLhod maLches Lokens and calls oLher
nonLermlnal meLhods, accordlng Lo Lhe grammar.
lf Lhe lookahead Loken does noL maLch, an error ls
reporLed.
A -> B | C | D
B -> a C b D
C -> ...
D -> ...
Lxample !ava lmplemenLauon: overvlew
7
statement -> assignment | compoundStmt
assignment-> ID ASSIGN expr SEMICOLON
compoundStmt -> LBRACE statement* RBRACE
expr -> ...
class Parser {
private int token; // current lookahead token
void accept(int t) {...} // accept t and read in next token
void error(String str) {...}// generate error message
void statement() {...}
void assignment () {...}
void compoundStmt () {...}
...
}
Lxample: recurslve descenL meLhods
8
statement -> assignment | compoundStmt
assignment-> ID ASSIGN expr SEMICOLON
compoundStmt -> LBRACE statement* RBRACE
class Parser {
void statement() {
switch(token) {
case ID: assignment(); break;
case LBRACE: compoundStmt(); break;
default: error("Expecting statement, found: " + token);
}
}
void assignment() {
accept(ID); accept(ASSIGN); expr(); accept(SEMICOLON);
}
void compoundStmt() {
accept(LBRACE);
while (token!=RBRACE) { statement(); }
accept(RBRACE);
}
...
}
Lxample: arser skeleLon deLalls
9
statement -> assignment | compoundStmt
assignment-> ID ASSIGN expr SEMICOLON
compoundStmt -> LBRACE statement* RBRACE
expr -> ...
class Parser {
nal static int ID=1, WHILE=2, DO=3, ASSIGN=4, ...;
private int token; // current lookahead token
void accept(int t) { // accept t and read in next token
if (token==t) {
token = nextToken();
} else {
error("Expected " + t + " , but found " + token);
}
}
void error(String str) {...}// generate error message
private int nextToken() {...} // read next token from scanner
void statement() ...
...
}
Are Lhese grammars LL(1)?
10
expr -> name params | name
expr -> expr "+" term | term
term -> ID
WhaL would happen ln a recurslve-descenL parser?
Could Lhey be LL(2)? LL(k)?
Common prex
Le recurslon
ueallng wlLh common prex of llmlLed lengLh:
Local lookahead
11
LL(2) grammar:
statement -> assignment | compoundStmt | callStmt
assignment-> ID ASSIGN expr SEMICOLON
compoundStmt -> LBRACE statement* RBRACE
callStmt -> ID LPAR expr RPAR SEMICOLON
void statement() ...
12
LL(2) grammar:
statement -> assignment | compoundStmt | callStmt
assignment-> ID ASSIGN expr SEMICOLON
compoundStmt -> LBRACE statement* RBRACE
callStmt -> ID LPAR expr RPAR SEMICOLON
void statement() {
switch(token) {
case ID:
if (lookahead(2) == ASSIGN) {
assignment();
} else {
callStmt();
}
break;
case LBRACE: compoundStmt(); break;
default: error("Expecting statement, found: " + token);
}
}
ueallng wlLh common prex of llmlLed lengLh:
Local lookahead
13
A -> B a
A -> B b
B -> c d
Common prex?
lf Lwo producuons can derlve a senLence sLarung ln Lhe same way,
Lhey share a 32662" 1-,90.
Whlch nonLermlnals have common prex producuons?
Pow long ls Lhe common prex?
ls Lhe grammar LL(1), LL2(), ...?
A -> a B
A -> a C
B -> b
C -> c
A -> a B
B -> a C
B -> b C
C -> c
A has two rules that can derive the prex a
The grammar is LL(2)
A has two rules that can derive the prex c d
The grammar is LL(3)
No problem. The two rules that start the
same cannot be derived from the same
nonterminal.
The grammar is LL(1)
14
Common prex?
Whlch nonLermlnals have common prex producuons?
Pow long ls Lhe common prex?
ls Lhe grammar LL(1), LL2(), ...?
A -> B
A -> C
A -> D
B -> b a
C -> b d
D -> e
A -> B a
A -> B b
B -> B c
B -> d
A has two rules that can derive the prex b
The grammar is LL(2)
A has two rules that can derive the prex d c*
So, the prex can become arbitrarily long.
The grammar is not LL(k), no matter what k we use.
We need to rewrite the grammar, or use another parsing
method.
13
Exp -> Name Params
Exp -> Name
Lllmlnaung Lhe common prex
8ewrlLe Lo an equlvalenL grammar wlLhouL Lhe common prex
With common prex - not LL(1)
16
Exp -> Name Params
Exp -> Name
Lllmlnaung Lhe common prex
8ewrlLe Lo an equlvalenL grammar wlLhouL Lhe common prex
With common prex - not LL(1)
Without common prex - LL(1)
Eliminating a common prex this way is
called "left factoring".
Exp -> Name OptParams
OptParams -> Params
OptParams -> !
17
Lllmlnaung Lhe common prex
8ewrlLe Lo an equlvalenL grammar wlLhouL Lhe common prex
A -> B
A -> C
B -> b a
B -> e D
B -> f
C -> b d
D -> B C
Indirect
common
prex
18
Lllmlnaung Lhe common prex
8ewrlLe Lo an equlvalenL grammar wlLhouL Lhe common prex
A -> B
A -> C
B -> b a
B -> e D
B -> f
C -> b d
D -> B C
A -> b a
A -> e D
A -> f
B -> b a
B -> e D
B -> f
A -> b d
C -> b d
D -> B C
First, make the common prex
directly visible:
Substitute all B right-hand sides
into the A -> B rule
We can't remove the B rules since
B is used in other places.
Similarly for the A -> C rule
Indirect
common
prex
Direct
common
prex
Then, eliminate the direct common prex, as previously.
19
ueallng wlLh le recurslon ln LL parsers
MeLhod 1: 8ewrlLe Lo an equlvalenL grammar wlLhouL le recurslon
(A blL cumbersome)
E -> E "+" T
E -> T
T-> ID
Left-recursive
grammar not LL(k)
E -> T "+" E
E -> T
T-> ID
Rewrite to right-recursion!
But there is now a common
prex! Still not LL(k).
E -> T E'
E' -> "+" E
E' -> !
T-> ID
Eliminate the common prex.
The grammar is now LL(1)
A left-recursive AST can be built during the right-recursive parse.
20
ueallng wlLh le recurslon ln LL parsers
MeLhod 2: 8ewrlLe Lo L8nl
(Lasy!)
E -> E "+" T
E -> T
T-> ID
Left-recursive
grammar not LL(k)
E -> T ( "+" T )*
T-> ID
Rewrite to EBNF!
A left-recursive AST can be built during the iteration.
21
!avaCC: An LL-based parser generaLor
CFG (in Java-like
spec langauge)
JavaCC
Parser
(in Java code)
!avaCC speclcauon
22
CFG:
statement -> assignment | compoundStmt
assignment-> ID ASSIGN expr SEMICOLON
compoundStmt -> LBRACE statement* RBRACE
JavaCC:
void statement() : {} {
assignment() | compoundStmt()
}
void assignment() : {} {
id() <ASSIGN> expr() <SEMICOLON>
}
void compoundStmt() : {} {
<LBRACE> (statement())* <RBRACE>
}
void id() : {} {
<ID>
}
Good idea to add a
nonterminal id for ID
tokens.
This way you can
avoid code duplication
in the semantic
actions.
Place where Java
code can be added.
You can also add Java
code inside the rules.
(For semantic actions,
e.g. build the AST)
uslng local lookahead ln !avaCC
Local lookahead can be used Lo dlscrlmlnaLe beLween an
asslgnmenL and a procedure call:
23
statement -> assignment | callStmt | whileStmt
assignment -> ID ASSIGN expr SEMICOLON
callStmt -> ID LPAR expr RPAR SEMICOLON
JavaCC:
void statement() : {} {
LOOKAHEAD(2) assignment()
| callStmt()
| whileStmt()
}
...
A lookahead of 2 tokens will be used before
selecting assignment. If that fails, ordinary
single-token lookahead will be used in the
following alternatives.
uslng L8nl ln !avaCC
SLralghL forward!
24
expr -> term (PLUS term)*
term -> factor (TIMES factor)*
factor -> ID | INT | LPAR expr RPAR
JavaCC:
void expr() : {} {
term() (<PLUS> term())*
}
void term() : {} {
factor() (<TIMES> factor())*
}
void factor() : {} {
id() | intExpr() | <LPAR> expr () <RPAR>
}
AlgorlLhm for consLrucung an LL(1) parser
23
Fairly simple.
The non-trivial part: how to select the correct
production p for X, based on the lookahead token.
X
... t
1
... t
n
t
n+1
...
FIRST FOLLOW
p1: X -> ...
p2: X -> ...
Which tokens can occur in the FIRST position?
Can one of the productions derive the empty
string? I.e., is it "NULLABLE"?
If so, which tokens can occur in the FOLLOW
position?
SLeps ln consLrucung an LL(1) parser
26
1. Write the grammar on canonical form
2. Analyze the grammar to construct a table.
The table shows what production to select, given
the current lookahead token.
3. Conicts in the table? The grammar is not LL(1).
4. No conicts? Straight forward implementation
using table-driven parser or recursive descent.
t
1
t
2
t
3
t
4

X
1
p1 p2
X
2
p3 p3 p4
Lxample:
ConsLrucL Lhe LL(1) Lable for Lhls grammar:
27
p1: statement -> assignment
p2: statement -> compoundStmt
p3: assignment -> ID "=" expr ";"
p4: compoundStmt -> "{" statements "}"
p5: statements -> statement statements
p6: statements -> !
ID "=" ";" "{" "}"
statement
assignment
compoundStmt
statements
For each production p: X -> #, we are interested in:
FIRST(#) the tokens that occur rst in a sentence derived from #.
NULLABLE(#) is it possible to derive ! from #? And if so:
FOLLOW(X) the tokens that can occur immediately after an X-sentence.
Lxample:
ConsLrucL Lhe LL(1) Lable for Lhls grammar:
28
p1: statement -> assignment
p2: statement -> compoundStmt
p3: assignment -> ID "=" expr ";"
p4: compoundStmt -> "{" statements "}"
p5: statements -> statement statements
p6: statements -> !
ID "=" ";" "{" "}"
statement p1 p2
assignment p3
compoundStmt p4
statements p3 p3 p6
To construct the table, look at each production p: X -> #.
Compute the token set FIRST(#). Add p to each corresponding entry for X.
Then, check if # is NULLABLE. If so, compute the token set FOLLOW(X),
and add p to each corresponding entry for X.
Lxample:
ueallng wlLh Lnd of llle:
29
p1: varDecl -> type ID optInit
p2: type -> "integer"
p3: type -> "boolean" "=" expr ";"
p4: optInit -> "=" INT
p5: optInit -> !
./ integer boolean "=" ";" INT
varDecl
type
optInit
Lxample:
ueallng wlLh Lnd of llle:
30
p0: S -> varDecl EOF
p1: varDecl -> type ID optInit
p2: type -> "integer"
p3: type -> "boolean" "=" expr ";"
p4: optInit -> "=" INT
p5: optInit -> !
./ integer boolean "=" ";" INT EOF
0 p0 p0
varDecl p1 p1
type p2 p3
optInit p4 p5
Lxample:
Amblguous grammar:
31
p1: E -> E "+" E
p2: E -> ID
p3: E -> INT
121 ID INT
E
Lxample:
Amblguous grammar:
32
p1: E -> E "+" E
p2: E -> ID
p3: E -> INT
121 ID INT
E p1, p2 p1, p3
Collision in a table entry!
The grammar is not LL(1)
An ambiguous grammar is not even LL(k)
adding more lookahead does not help.
Lxample:
unamblguous, buL le-recurslve grammar:
33
p1: E -> E "*" F
p2: E -> F
p3: F -> ID
p4: F -> INT
131 ID INT
4
F
Lxample:
unamblguous, buL le-recurslve grammar:
34
p1: E -> E "*" F
p2: E -> F
p3: F -> ID
p4: F -> INT
131 ID INT
4 p1,p2 p1,p2
F p3 p4
Collision in a table entry!
The grammar is not LL(1)
A grammar with left-recursion is not even LL(k)
adding more lookahead does not help.
Lxample:
Crammar wlLh common prex:
33
p1: E -> F "*" E
p2: E -> F
p3: F -> ID
p4: F -> INT
p5: F -> "(" E ")"
131 ID INT "(" ")"
4
F
Lxample:
Crammar wlLh common prex:
36
p1: E -> F "*" E
p2: E -> F
p3: F -> ID
p4: F -> INT
p5: F -> "(" E ")"
131 ID INT "(" ")"
4 p1,p2 p1,p2 p1,p2
F p3 p4 p3
Collision in a table entry!
The grammar is not LL(1)
A grammar with common prex is not LL(1).
Some grammars with common prex are LL(k), for some k,
but not this one.
Lxample:
AnoLher grammar wlLh common prex:
37
p1: Stmt -> ID "(" IdList ")"
p2: Stmt -> ID "=" Exp
ID "(" ")" "="
0)5)
...
Lxample:
AnoLher grammar wlLh common prex:
38
p1: Stmt -> ID "(" IdList ")"
p2: Stmt -> ID "=" Exp
ID "(" ")" "="
0)5) p1, p2
...
Collision in a table entry!
The grammar is not LL(1)
A grammar with common prex is not LL(1)
But this grammar is LL(2)
We could creaLe an LL(2) Lable!
(global lookahead 2)
39
p1: Stmt -> ID "(" IdList ")"
p2: Stmt -> ID "=" Exp
ID ID ID "(" ID "=" ID ")" "(" ID "(" "(" ...
0)5) p1 p2
...
No conicts! The grammar is LL(2)!
But k > 1 gives very large tables inefcient!
40
p1: Stmt -> ID "(" IdList ")"
p2: Stmt -> ID "=" Exp
ID "(" ")" "="
0)5) "(" "="
p1 p2
...
No collisions!
A slightly more complex table structure for the local lookahead.
Will be efcient.
JavaCC can generate both LL(k) and local lookahead parsers.
Using k > 1 is not recommended. Too slow parsing.
Use local lookahead if needed.
A beuer alLernauve: Local lookahead!
Summary: consLrucung an LL(1) parser
41
1. Write the grammar on canonical form
2. Analyze the grammar using FIRST, NULLABLE, and
FOLLOW.
3. Use the analysis to construct a table.
The table shows what production to select, given
the current lookahead token.
4. Conicts in the table? The grammar is not LL(1).
5. No conicts? Straight forward implementation
using table-driven parser or recursive descent.
Summary quesuons
42
ConsLrucL a ClC for a slmple parL of a programmlng language.
ConsLrucL a recurslve descenL parser for a slmple language.
Clve Lyplcal examples of amblgulues ln ClCs
WhaL ls Lhe dlerence beLween LL(1) and LL(k)?
WhaL ls a "common prex", and how can lL be ellmlnaLed?
WhaL ls meanL by "le facLorlng"?
WhaL ls "le recurslon" and how can lL be ellmlnaLed?
ln whaL way can an LL synLax Lree dler from Lhe deslred AS1?
ConsLrucL an L8nl grammar for convenuonal arlLhmeuc expresslons LhaL
respecL sLandard precedence and assoclauvlLy.
WhaL ls nuLLA8LL(x), ll8S1(x), and lCLLCW(x)?
ConsLrucL an LL(1) Lable for a grammar.
WhaL does lL mean lf Lhere ls a colllslon ln an LL(1) Lable?
WhaL ls Lhe dlerence beLween local lookahead and global lookahead?
Why can lL be useful Lo add an end-of-le rule Lo some grammars?
Pow can we declde lf a grammar ls LL(1) or noL?
8eadlngs
43
l4: redlcuve parslng. 8ecurslve descenL. LL grammars
and parslng. Le recurslon and facLorlzauon.
Appel, chapLer 3.2

You might also like