Top Down Parsing CD

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Top-Down Parsing 1 Compiler Design Muhammed Mudawwar

Top-Down Parsing
v A parser is top-down if it discovers a parse tree top to bottom
-A top-down parse corresponds to a preorder traversal of the parse tree
-A leftmost derivation is applied at each derivation step
v Top-down parsers come in two forms
-Predictive Parsers
Predict the production rule to be applied using lookahead tokens
-Backtracking Parsers
Will try different productions, backing up when a parse fails
v Predictive parsers are much faster than backtracking ones
-Predictive parsers operate in linear time will be our focus
-Backtracking parsers operate in exponential time will not be considered
v Two kinds of top-down parsing techniques will be studied
-Recursive-descent parsing
-LL parsing
Top-Down Parsing 2 Compiler Design Muhammed Mudawwar
Top-Down Parsing by Recursive-Descent
v We view a nonterminal A as a definition of a procedure A
-Procedure A will match the token sequence generated by nonterminal A
v The RHS of a production of A specifies the code for procedure A
-Terminals are matched against input tokens
-Nonterminals produce calls to corresponding procedures
v If multiple production rules exist for a nonterminal A
-One of them is predicted based on a lookahead token
The lookahead token is the next input token that should be matched
-The predicted rule is the only one that applies
Other rules will NOT be tried
This is a predictive parsing technique, not a backtracking one
v A syntax error is detected when
-Next token in the input sequence does NOT match the expected token
Top-Down Parsing 3 Compiler Design Muhammed Mudawwar
Example on Recursive-Descent Parsing
v Consider the following grammar for expressions in EBNF notation
expr term { addop term }
term factor { mulop factor }
factor ( expr ) | id | num
v Since three nonterminals exist, we need three parsing procedures
-The curly brackets expressing repetition is translated into a while loop
-The vertical bar expressing alternation is translated into a case statement
procedure expr( )
begin
term( );
while token = ADDOP do
match(ADDOP);
term( );
end while;
end expr;
procedure term( )
begin
factor( );
while token = MULOP do
match(MULOP);
factor( );
end while;
end term;
procedure factor( )
begin
case token of
(: match((); expr( ); match());
ID: match(ID);
NUM: match(NUM);
else syntax_error(token);
end case;
end factor;
Top-Down Parsing 4 Compiler Design Muhammed Mudawwar
Lookahead Token and Match Procedure
v The recursive-descent procedures use a token variable
v The token variable is the lookahead token
-Keeps track of the next token in the input sequence
-Is initialized to the first token before parsing begins
-Is updated after every call to the match procedure
v The match procedure matches lookahead token with its parameter
-Is called to match an expected token on the RHS of a production
-Match succeeds if expected token = lookahead token and fails otherwise
-Match calls scanner function to update the lookahead token
procedure match ( ExpectedToken )
begin
if token = ExpectedToken then token := scan( ) ;
else syntax_error( token , ExpectedToken ) ;
end if;
end match ;
Top-Down Parsing 5 Compiler Design Muhammed Mudawwar
Syntax Tree Construction for Expressions
v A recursive-descent parser can be used to construct a syntax tree
SyntaxTree := expr ( ) ; Calling parser function for start symbol
v Parsing functions allocate and return pointers to syntax tree nodes
v Construction of a syntax tree for simple expressions is given below
-Newnode allocates a tree node and returns a pointer to it
function expr( ) : TreePtr
begin
left := term( );
while token = ADDOP do
op := ADDOP.op ; match(ADDOP);
right := term( );
left := new node(op, left, right);
end while;
return left;
end expr;
function term( ) : TreePtr
begin
left := factor( );
while token = MULOP do
op := MULOP.op; match(MULOP);
right := factor( );
left := new node(op, left, right);
end while;
return left;
end term;
Top-Down Parsing 6 Compiler Design Muhammed Mudawwar
Syntax Tree Construction cont'd
v For a factor, we have the following parsing function
-symtable.lookup(ID.name) searches a symbol table for a given name
-lookup function returns a pointer to an identifier symbol in symtable
-Identifiers are inserted into symbol table when parsing a declaration
-The NUM.ptr is a pointer to a literal symbol in the literal table
-Literal constants are inserted into the literal table when scanned
function factor( ) : TreePtr
begin
case token of
(: match((); ptr := expr( ); match());
ID: ptr := symtable.lookup(ID.name); match(ID);
NUM: ptr := NUM.ptr; match(NUM);
else syntax_error(token, Expecting a number, an identifier, or ( );
end case;
return ptr;
end factor;
Top-Down Parsing 7 Compiler Design Muhammed Mudawwar
Node Structure for Expression Trees
v A syntax tree node for expressions should have at least:
-Node operaror: + , , * , / , etc. Different for each operator
For symbol table entries, the node operator is ID
For literal table entries, the node operator is NUM
Other node operators can be added to statements and various types of literals
-Left Pointer: pointer to left child expression tree
Can point to a tree node, to a symbol node, or to a literal node
-Right Pointer: pointer to right child expression tree
Can point to a tree node, to a symbol node, or to a literal node
v The following fields are also important:
-Line, Pos: keeps track of line and position of each tree node
-Type: associates a type with each tree node
Type information is used to check the type of expressions
Top-Down Parsing 8 Compiler Design Muhammed Mudawwar
Tracing the Construction of a Syntax Tree
v Although recursive-descent is a top-down parsing technique
-The construction of the syntax tree for expressions is bottom up
-Tracing verifies the precedence and associativity of operators
v The tree construction of a b + c * (b + d) is given below
-ptr
1
symtable.lookup(a)
-ptr
2
symtable.lookup(b)
-ptr
3
new node( , ptr
1
, ptr
2
)
-ptr
4
symtable.lookup(c)
-ptr
2
symtable.lookup(b)
-ptr
5
symtable.lookup(d)
-ptr
6
new node(+ , ptr
2
, ptr
5
)
-ptr
7
new node(* , ptr
4
, ptr
6
)
-ptr
8
new node(+ , ptr
3
, ptr
7
)
+
*
ID c
ID d
ID a ID b

ptr
1
ptr
2
+
ptr
2
ptr
5
ptr
4
ptr
6
ptr
3
ptr
7
ptr
8
Top-Down Parsing 9 Compiler Design Muhammed Mudawwar
Syntax Tree Construction for if Statements
v An Extended BNF grammar for if statements with optional else:
if-stmt if expr then stmt [ else stmt ]
v A parsing function can eliminate the ambiguity of else
-By matching an else token as soon as encountered
v Syntax tree of if stmt is constructed bottom-up
function ifstmt( ) : TreePtr
begin
match(IF); exprptr := expr( ); match(THEN); thenptr := stmt( );
if token = ELSE then
match(ELSE); elseptr := stmt( );
elseptr := new node(ELSE, thenptr, elseptr); // ELSE node
ifptr := new node(IF, exprptr, elseptr); // IF node points to ELSE node
else ifptr := new node(IF, exprptr, thenptr); end if; // No ELSE node
return ifptr;
end ifstmt;
Top-Down Parsing 10 Compiler Design Muhammed Mudawwar
LL Parsing
v Uses an explicit stack rather than recursive calls to perform a parse
v LL(k) parsing means that k tokens of lookahead are used
-The first L means that token sequence is read from left to right
-The second L means a leftmost derivation is applied at each step
v An LL parser consists of
-Parser stack that holds grammar symbols: non-terminals and tokens
-Parsing table that specifies the parser action
-Driver function that interacts with parser stack, parsing table and scanner
Parser
Driver
Scanner
next token
Parsing
Table
P
a
r
s
i
n
g

S
t
a
c
k
Output
Top-Down Parsing 11 Compiler Design Muhammed Mudawwar
LL Parsing Actions
v The LL parsing actions are:
-Match: to match top of parser stack with next input token
-Predict: to predict a production and apply it in a derivation step
-Accept: to accept and terminate the parsing of a sequence of tokens
-Error: to report an error message when matching or prediction fails
v Consider the following grammar: S ( S ) S |
Parser Stack Input Parser Action
S ( ( ) ) $ Predict S ( S ) S
( S ) S ( ( ) ) $ Match (
S ) S ( ) ) $ Predict S ( S ) S
( S ) S ) S ( ) ) $ Match (
S ) S ) S ) ) $ Predict S
) S ) S ) ) $ Match )
S ) S ) $ Predict S
) S ) $ Match )
S $ Predict S
Empty $ Accept
Parsing of ( ( ) )
Stack grows backward
from right to left
Top-Down Parsing 12 Compiler Design Muhammed Mudawwar
Grammar Analysis: Nonterminals that Derive
v Grammar analysis is necessary to
- Determine whether a grammar can be used in LL parsing
- Construct the LL parsing table that defines the actions of an LL parser
v A common analysis is to determine which nonterminals derive
- Nonterminals that derive are called nullable
v To determine which nonterminals derive
- We use an iterative marking algorithm
- First, nonterminals that derive directly in one step are marked
- Nonterminals that derive in two, three, steps are found and marked
- Continue until no more nonterminals can be marked as deriving
v Consider the following grammar
A B C D
B b C |
C c D |
D d |
A, B, C, and D are all nullable
B, C, and D derive directly
A derives indirectly: A B C D C D D
Top-Down Parsing 13 Compiler Design Muhammed Mudawwar
Grammar Analysis: The First Set
v Suppose we have the following grammar:
-The RHS of the productions of S do not begin with terminals
-Parser has no immediate guidance which production to apply to expand S
-We may follow all possible derivations of S as shown below
S A a | B b
A D c | C A
B d A | e
C f C | b
D h | i
v We predict S A a when
-First token is h, i, f, or b. First(Aa) = {h, i, f, b}
v We predict S B b when
-First token is d or e. First(Bb) = {d, e}
v Otherwise, we have an error
S
A a
B b
d A b
e b
C Aa
D c a
h c a
i c a
f C Aa
b A a
Top-Down Parsing 14 Compiler Design Muhammed Mudawwar
Grammar Analysis: Determining the First Set
v Formally, First() = {a T | * a}
-First() is the set of all terminals that can begin a sentential form of
v If * then First()
v To calculate First() we apply the following rules
-First() = {} =
-First(a) = {a} = a
-First(A) = First(
1
) First(
2
) = A and A
1
|
2
|
-First(A) = First(A) = A and A is NOT nullable
-First(A) = (First(A) {}) First() = A and A
+

v Consider the following grammar:
S A B C d
A e | f |
B g | h |
C p | q
First(A) = {e, f, }
First(B) = {g, h, }
First(C) = {p, q}
First(S) = First(ABCd) = (First(A){}) (First(B){}) First(Cd)
= {e, f} {g, h} {p, q} = {e, f, g, h, p, q}
Top-Down Parsing 15 Compiler Design Muhammed Mudawwar
Grammar Analysis: The Follow Set
v Suppose we have the following grammar
- We follow derivations of S as shown below
S A c B
A a A
A
B b B S
B
v We predict A a A when
- Next token is a because First(aA) = {a}
v We predict A when
- Next token is c because Follow(A) = {c}
v Similarly, we predict B b B S when
- Next token is b because First(b B S) = {b}
v We predict B when
- Next token is a, c, or $ (end-of-file token) because Follow(B) = {a, c, $}
S A c B
a A c B
c B
c b B S c b B A c B
c
c b B a A c B
c b B c B
Top-Down Parsing 16 Compiler Design Muhammed Mudawwar
Grammar Analysis: Determining the Follow Set
v Formally, Follow(A) = {a T | S
+
A a }
-Follow(A) is the set of terminals that may follow A in any sentential form
v If S
+
A then $ Follow(A)
-If A is not followed by any terminal then it is followed by the end of file
-The $ represents the end-of-file token
v We compute Follow(A) using the following rules:
-If A is the start symbol then $ Follow(A)
-Inspect RHS of productions for all occurrences of A
Let a typical production be B A
If is NOT nullable then add First() to Follow(A)
l Any token that can begin a sentential form of can follow A
If is or derives then add (First() {}) Follow(B) to Follow(A)
l If vanishes in a given derivation then what follows A is what follows B
l B A A what follows A is what follows B is First()
Top-Down Parsing 17 Compiler Design Muhammed Mudawwar
Examples on the First and Follow Sets
Example 1:
S A c B
A a A
A
B b B S
B
Example 2:
E T Q
Q + T Q | T Q |
T F R
R * F R | / F R |
F ( E ) | id
Nonterminals that derive are A and B
First(S) = First(AcB) = (First(A){}) First(cB) = {a, c}
First(A) = First(aA) First() = {a, }
First(B) = First(bBS) First() = {b, }
Follow(S) = {$} Follow(B)
Follow(A) = {c}
Follow(B) = Follow(S) First(S) = {$, a, c}
Follow(S) = {$, a, c}
Nonterminals that derive are Q and R
First(E) = First(TQ) = First(T) = First(FR) = First(F) = {( , id}
First(Q) = {+ , , }
First(R) = {* , / , }
Follow(E) = {$ , )}
Follow(Q) = Follow(E) = {$ , )}
Follow(T) = (First(Q){}) Follow(E) Follow(Q) = {+, , $, )}
Follow(R) = Follow(T) = {+, , $, )}
Follow(F) = (First(R){}) Follow(T) Follow(R) = {*, /, +, , $, )}
Top-Down Parsing 18 Compiler Design Muhammed Mudawwar
Grammar Analysis: Determining the Predict Set
v The predict set of a production A is defined as follows:
-If is NOT nullable then Predict(A ) = First()
-If is Nullable then Predict(A ) = (First() {}) Follow(A)
-This is the set of lookahead tokens that will cause the selection of A
v Example on determining the predict set:
E T Q
Q + T Q
Q T Q
Q
T F R
R * F R
R / F R
R
F ( E )
F id
Predict E T Q = First(TQ) = First(T) = {( , id}
Predict Q + T Q = First(+TQ) = { + }
Predict Q T Q = First(TQ) = { }
Predict Q = Follow(Q) = {$ , )}
Predict T F R = First(FR) = First(F) = {( , id}
Predict R * F R = First(*FR) = { * }
Predict R / F R = First(/FR) = { / }
Predict R = Follow(R) = {+ , , $ , )}
Predict F ( E ) = { ( }
Predict F id = { id }
Top-Down Parsing 19 Compiler Design Muhammed Mudawwar
LL(1) Grammars
v Not all context-free grammars are suitable for LL parsing
v CFGs suitable for LL(1) parsing are called LL(1) Grammars
v A grammar is LL(1) if for productions with the same LHS A
A
1
|
2
| |
n
Predict(A
i
) Predict(A
j
) = for all i j
The predict sets of productions with same LHS are pairwise disjoint
v The following grammar is LL(1)
S A c B
A a A Predict(A a A) = {a}
A Predict(A ) = Follow(A) = {c}
B b B S Predict(B b B S) = {b}
B Predict(B ) = Follow(B) = Follow(S) First(S) =
= {$} Follow(B) {a, c} = {$, a, c}
Disjoint
Disjoint
Top-Down Parsing 20 Compiler Design Muhammed Mudawwar
Constructing the LL(1) Parsing Table
v The predict sets can be represented in an LL(1) parse table
-The rows are indexed by the nonterminals
-The columns are indexed by the tokens
v If A is a nonterminal and tok is the lookahead token then
-Table[A][tok] indicates which production to predict
-If no production can be used Table[A][tok] gives an error value
v Table[A][tok] = A iff tok predict(A )
v Example on constructing the LL(1) parsing table:
1: S A c B Predict(1) = {a, c}
2: A a A Predict(2) = {a}
3: A Predict(3) = {c}
4: B b B S Predict(4) = {b}
5: B Predict(5) = {$, a, c}
Empty slots
indicate error
conditions
a b c $
S
A
B
1 1
2 3
4 5 5 5
Top-Down Parsing 21 Compiler Design Muhammed Mudawwar
Constructing the LL(1) Parsing Table cont'd
v Here is a second example on constructing the parsing table
v Because the above grammar is LL(1)
-A unique production number is stored in a table entry
v Blank entries correspond to error conditions
-In practice, special error numbers are used to indicate error situations
1: E T Q Predict(1) = { ( , id }
2: Q + T Q Predict(2) = { + }
3: Q T Q Predict(3) = { }
4: Q Predict(4) = { $ , ) }
5: T F R Predict(5) = { ( , id }
6: R * F R Predict(6) = { * }
7: R / F R Predict(7) = { / }
8: R Predict(8) = {+, , $, )}
9: F ( E ) Predict(9) = { ( }
10: F id Predict(10) = { id }
+ * /
E
Q
T
1 1
2 3
7
5
6
5
R
F
( ) id $
4 4
8 8 8 8
9 10
Top-Down Parsing 22 Compiler Design Muhammed Mudawwar
LL(1) Parser Driver Algorithm
v The LL(1) parser driver algorithm can be described as follows:
Token := scan( )
Stack.push(StartSymbol)
while not Stack.empty( ) do
X := Stack.pop( )
if terminal(X)
if X = Token then Token := scan( )
else process a syntax error at Token end if
else (* X is a nonterminal *)
Rule := Table[X][Token]
if Rule = X Y
1
Y
2
Y
n
then
for i fromn downto 1 do Stack.push(Y
i
) end for
else process a syntax error at Token end if
end if
end while
if Token = $ then accept parsing
else report a syntax error at Token end if
Parser
Driver
Scanner
next token
Parsing
Table
P
a
r
s
i
n
g

S
t
a
c
k
Output
Top-Down Parsing 23 Compiler Design Muhammed Mudawwar
Tracing an LL(1) Parser
Consider the parsing of id * (id + id) $
Parser Stack Input Parser Action
E id*(id+id)$ Predict E T Q
T Q id*(id+id)$ Predict T F R
F R Q id*(id+id)$ Predict F id
id R Q id*(id+id)$ Match id
R Q *(id+id)$ Predict R * F R
* F R Q *(id+id)$ Match *
F R Q (id+id)$ Predict F ( E )
( E ) R Q (id+id)$ Match (
E ) R Q id+id)$ Predict E T Q
T Q ) R Q id+id)$ Predict T F R
F R Q ) R Q id+id)$ Predict F id
id R Q ) R Q id+id)$ Match id
R Q ) R Q +id)$ Predict R
Q ) R Q +id)$ Predict Q + T Q
+ T Q) R Q +id)$ Match +
T Q ) R Q id)$ Predict T F R
F R Q ) R Q id)$ Predict F id
id R Q ) R Q id)$ Match id
R Q ) R Q )$ Predict R
Q ) R Q )$ Predict Q
) R Q )$ Match )
R Q $ Predict R
Q $ Predict Q
Empty $ Accept
S
t
a
c
k

g
r
o
w
s

b
a
c
k
w
a
r
d
s

f
r
o
m

r
i
g
h
t

t
o

l
e
f
t
+ * /
E
Q
T
1 1
2 3
7
5
6
5
R
F
( ) id $
4 4
8 8 8 8
9 10
1: E T Q
2: Q + T Q
3: Q T Q
4: Q
5: T F R
6: R * F R
7: R / F R
8: R
9: F ( E )
10: F id
Top-Down Parsing 24 Compiler Design Muhammed Mudawwar
The Problem of Left Recursion
v Left recursive grammars fail to be LL(1) or even LL(k)
-A left recursive production puts an LL parser into infinite loop
-If a left recursive production is predicted then
Nonterminal on LHS is replaced with RHS of production
The same nonterminal will appear again on top of parser stack
The same production is predicted again
Iteration goes forever
v Left recursion is commonly used to
-Make an operation left associative
Expr Expr addop Term | Term
-Specify a list of identifiers, statements, etc.
StmtList StmtList ; Statement | Statement
v We need to eliminate left recursion to make a grammar LL(1)
Top-Down Parsing 25 Compiler Design Muhammed Mudawwar
Eliminating Immediate Left Recursion
v The simplest case of left recursion is immediate left recursion
-General Form: A A |
-The above productions of A generate strings of the form
n
, n 0
-We introduce a new nonterminal and use right recursion as follows:
A Atail
Atail Atail |
v In general, if many immediate left recursive productions exist
-General Form: A A
1
| A
2
| | A
n
|
1
|
2
| |
m
-We introduce a new nonterminal and use right recursion
A
1
Atail |
2
Atail | |
m
Atail
Atail
1
Atail |
2
Atail | |
n
Atail |
v For example: Expr Expr + Term | Expr Term | Term becomes:
Expr Term Exprtail
Exprtail + Term Exprtail | Term Exprtail |
Top-Down Parsing 26 Compiler Design Muhammed Mudawwar
Eliminating Indirect Left Recursion
v In some cases, left recursion may be indirect
For example: A B | and B A |
v We can do substitutions to make left recursion immediate
v Consider the following grammar:
A B a | A a | c
B B b | A b | d
v First, we remove the immediate left recursion of A
A B a Atail | c Atail Atail a Atail |
v Second, we eliminate the indirect left recursion of B A b
B B b | B a Atail b | c Atail b | d
v Finally, we remove the immediate left recursion of B
B c Atail b Btail | d Btail
Btail b Btail | a Atail b Btail |
Top-Down Parsing 27 Compiler Design Muhammed Mudawwar
Left Factoring of Common Prefixes
v Another problem to LL parsers is to have a common prefix
v An if statement may have 2 production with a common prefix:
IfStmt if Expr then StmtList end if ;
IfStmt if Expr then StmtList else StmtList end if ;
v An LL(1) parser cannot predict which production to apply
v The solution is use left factoring of the common prefix
-General Form: A | | |
-Left Factoring solution:
A Atail
Atail | | |
v The left factoring of the two if statement productions:
IfStmt if Expr then StmtList IfStmtTail
IfStmtTail else StmtList end if ; | end if ;

You might also like