Top-Down Parsing-Prerequisites For Predictive Parsing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Top-Down Parsing-Prerequisites for Predictive Parsing

Top-down parsing can be viewed as the problem of constructing a parse tree for the input string, starting
from the root and creating the nodes of the parse tree in preorder or depth-first. Top-down parsing can
also be viewed as an attempt to find a left-most derivation for an input string starting from the start
symbol.
Since top-down parsing uses left most derivation and the input is scanned from left to right the grammars
for which top-down parsers can be constructed are sometimes called the LL(k) class.
Example: The given grammar G=(V,T,P,S) is as follows:
The set of Variables (non-terminals) is {E,T,F}
The set of Terminals is {+,*,id}
The start symbols is E and the list of production rules are as follows:
EE+T|T
TT*F|F
Fid
The given input is id+id*id
In top-down parsing choosing the appropriate production for the left-most nonterminal A is the key
problem at each derivation step. Once the appropriate production is chosen, the rest of the process
consists of matching the terminal symbols in the production body with the symbols in the input string
as in the parse tree given below.
E => E => E => E => E =>

E + T E + T E + T E + T

T T T

F F

id
E => E => E => E

E + T E + T E + T E + T

T T * F T T * F T T * F T T * F

F F F F F F F id

Id id id id id id

Recursive-Descent Parsing
We now look at a general form of top-down parsing, called recursive descent parsing. A recursive-
descent parsing program consists of a set of procedures, one for each nonterminal. Execution begins
with the procedure for the start symbol, which halts and announces success if its procedure body scans
the entire input string. An algorithm for recursive-descent parsing is given below.
void A{) {
Choose an A-production, A X1X2 - • • Xk
for ( i = 1 to k ) {
if ( X{ is a nonterminal )
call procedure Xi();
else if ( Xi equals the current input symbol a )
advance the input to the next symbol;
else /* an error has occurred */;
}
}
Figure: A typical procedure for a nonterminal in a top-down parser
General recursive-descent may require backtracking; that is, it may require repeated scans over the
input. To allow backtracking, the code of the procedure in the above figure needs to be modified as
follows:
1. A unique A-production cannot be chosen in a single step, so each of the several productions
must be chosen in some order.
2. Failure at the last line is not ultimate-failure so we need to return back to the first line to try
another production.
3. In order to try another production, we need to reset the input pointer to where it was when we
first reached the first line. A local variable is needed to store the input pointer.
4. If there are no more A-productions to try we declare that an input error has been found.
Example: Consider the grammar
S -> cAd
A -> a b | a
To construct a parse tree top-down for the input string w = cad, begin with a tree consisting of a single
node labeled S, and the input pointer pointing to c, the first symbol of w. S has only one production, so
we use it to expand S and obtain the tree in Figure(a). The leftmost leaf, labeled c, matches the first
symbol of input w, so we advance the input pointer to a, the second symbol of w, and consider the next
leaf, labeled A.

Figure: Steps in top-down parsing


Now, we expand A using the first alternative A -> a b to obtain the tree of Figure(b). We have a match
for the second input symbol, a, so we advance the input pointer to d, the third input symbol, and
compare d against the next leaf, labeled b. Since b does not match d, we report failure and go back
to A to see whether there is another alternative for A that has not been tried, but that might produce a
match.
In going back to A, we must reset the input pointer to position 2, the position it had when we first came
to A, which means that the procedure for A must store the input pointer in a local variable.
The second alternative for A produces the tree of Figure(c). The leaf a matches the second symbol of
w and the leaf d matches the third symbol. Since we have produced a parse tree for w, we halt and
announce successful completion of parsing.
Backtracking is rarely needed to parse programming language constructs, so backtracking parsers are
not seen frequently. We now move on to predictive parsers, recursive descent parsers with no
backtracking.

Predictive Parsing
Predictive parsing chooses the correct A-production by looking ahead at the input a fixed number of
symbols. Here we look at the case where the fixed number of symbols in the input is one. There are a
few pre-requisites or changes to be made to a grammar for predictive parsing. The pre-requisites for
predictive parsing are:
1. Elimination of left recursion
2. Left factoring the grammar

Elimination of Left Recursion


A grammar is left recursive if it has a nonterminal A such that there is a derivation A  Aα for some
string α.Top-down parsing methods cannot handle left-recursive grammars, so a transformation is
needed to eliminate left recursion. We shall now see how the left-recursive pair of productions A  A
α | β could be replaced by the non-left-recursive productions:

without changing the strings derivable from A. This rule by itself suffices for many grammars.
Example: Let us eliminate left recursion from the following grammar:
EE+T|T
TT*F|F
F(E)|id
In the above grammar there are two production EE+T|T and TT*F|F which are in the form AA
α | β. Elimination of left recursion from the above grammar results in the grammar below.
The left-recursive pair of productions E  E + T | T are replaced by E -> T E' and E'  + T E' | ϵ. The
new productions for T and T' are obtained similarly by eliminating immediate left recursion.
Immediate left recursion can be eliminated by the following technique, which works for any number of
A-productions. First, group the productions as

where no bi begins with an A. Then, replace the A-productions by

This procedure eliminates all left recursion from the A and A' productions (provided no α is ϵ), but it
does not eliminate left recursion involving derivations of two or more steps. For example, consider the
grammar

The nonterminal S is left recursive because S =>Aa=> Sda, but it is not immediately left recursive.

Algorithm, below, systematically eliminates left recursion from a grammar. It is guaranteed to work if
the grammar has no cycles (derivations of the form A=>* A) or ϵ -productions (productions of the
form A —> ϵ).

Algorithm: Eliminating left recursion.


Input: Grammar G with no cycles or e-productions.
Output: An equivalent grammar with no left recursion.
Method: Apply the algorithm in Figure below to G. Note that the resulting non-left-recursive
grammar may have ϵ -productions.
Figure: Algorithm to eliminate left recursion from a grammar

The procedure in the above figure works as follows:

1. After the first iteration for i=1, the outer for loop (lines 2 to 7) eliminates any immediate left
recursion among A1-productions.
2. Any remaining A1-productions of the form A1 Alα must therefore have l>1.
3. After the i- 1st iterations of the outer for-loop all non-terminals Ak having productions AkAiα,
must have l>k.
4. On the ith iteration the inner for loop (lines 3 to 5) raise the lower limit in any production
AiAmα until we have m>=i.
5. Finally eliminating immediate left recursion at line (6) forces m to be greater than i resulting in
the removal of left recursion.

Example: Let us apply the above Algorithm to the following grammar.

Technically, the algorithm is not guaranteed to work, because of the ϵ-production, but in this case, the
production A ϵ turns out to be harmless.
We order the non-terminals S, A. There is no immediate left recursion among the 5-productions, so
nothing happens during the outer loop for i = 1. For i = 2, we substitute for S in A  S d to obtain the
following A.-productions.
A-> Ac | Aa d | b d | e

Eliminating the immediate left recursion among these A-productions yields the following grammar.
S->Aa | b
A->bdA' | A'
A'-> cA' | ad A' | e

Left Factoring
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive, or top-down, parsing. When the choice between two alternative A-productions is not clear,
we may be able to rewrite the productions to defer the decision until enough of the input has been seen
that we can make the right choice. For example, if we have the two productions

on seeing the input if, we cannot immediately tell which production to choose to expand stmt. In general,
if A αβ1| αβ2 are two A-productions, and the input begins with a nonempty string derived from a, we
do not know whether to expand A to αβ1 αβ2 . However, we may defer the decision by expanding A
to αA'. Then, after seeing the input derived from α, we expand A' to β1 or to β2. That is, left-factored,
the original productions become

Algorithm: Left factoring a grammar.


Input: Grammar G.
Output: An equivalent left-factored grammar.
Thus, we may expand S to iEtSS' on input i, and wait until iEtS has been seen to decide whether to
expand S' to eS or to ϵ.

You might also like