Professional Documents
Culture Documents
PCD - Unit Ii
PCD - Unit Ii
Parsing Techniques: Need and Role of the Parser-Context Free Grammars, Top Down Parsing -General
Strategies, Recursive Descent Parser - Predictive Parser-LL(1) Parser, Bottom up Parsing – Shift Reduce
Parser-LR Parser - LR (0)Item, Construction of SLR Parsing Table. Introduction to Canonical LR(1) and LALR
Parser – operator Precedence Parsing - Error Handling and Recovery in Syntax Analyzer, YACC-Design of a
syntax Analyzer for a Sample Language
Parser is a compiler that is used to break the data into smaller elements coming from lexical analysis phase.
In the syntax analysis phase, a compiler verifies whether or not the tokens generated by the lexical analyzer are
grouped according to the syntactic rules of the language. This is done by a parser. The parser obtains a string of
tokens from the lexical analyzer and verifies that the string can be the grammar for the source language. It detects
and reports any syntax errors and produces a parse tree from which intermediate code can be generated.
Context free grammar is a formal grammar which is used to generate all possible strings in a given formal language.
G= (V, T, P, S)
In CFG, the start symbol is used to derive the string. the string can be derived by repeatedly replacing a non-terminal
by the right hand side of the production, until all non-terminal have been replaced by terminal symbols.
Example:
Production rules:
S → aSa
S → bSb
S→c
Now check that abbcbba string can be derived from the given CFG.
S ⇒ aSa
S ⇒ abSba
S ⇒ abbSbba
S ⇒ abbcbba
By applying the production S → aSa, S → bSb recursively and finally applying the production S → c, string abbcbba
can be obtained.
Parsing:
Parsing is classified into two categories, i.e. Top-Down Parsing, and Bottom-Up Parsing. Top-Down Parsing is based
on Left Most Derivation whereas Bottom-Up Parsing is dependent on Reverse Right Most Derivation.
Top-down parsing is a method of parsing the input string provided by the lexical analyzer. The top-down parser
parses the input string and then generates the parse tree for it.
Construction of the parse tree starts from the root node i.e. the start symbol of the grammar. Then using leftmost
derivation it derives a string that matches the input string.
In the top-down approach construction of the parse tree starts from the root node and end up creating the leaf
nodes. Here the leaf node presents the terminals that match the terminals of the input string.
• To derive the input string, first, a production in grammar with a start symbol is applied.
• Now at each step, the parser has to identify which production rule of a non-terminal must be applied in order to
derive the input string.
• The next step is to match the terminals in the production with the terminals of the input string.
Consider the input string provided by the lexical analyzer is ‘abd’ for the following grammar.
S -> a A d
A -> b | b c
The top-down parser will parse the input string ‘abd’ and will start creating the parse tree with the starting
symbol ‘S‘.
Now the first input symbol ‘a‘ matches the first leaf node of the tree. So the parser will move ahead and find a
match for the second input symbol ‘b‘.
But the next leaf node of the tree is a non-terminal i.e., A, that has two productions. Here, the parser has to choose
the A-production that can derive the string ‘abc‘. So the parser identifies the A-production A-> b.
Now the next leaf node ‘b‘ matches the second input symbol ‘b‘. Further, the third input symbol ‘d‘ matches the
last leaf node ‘d‘ of the tree. Thereby successfully completing the top-down parsing
• Recursive-descent parsers: Recursive-descent parsers are a type of top-down parser that uses a set of recursive
procedures to parse the input. Each non-terminal symbol in the grammar corresponds to a procedure that parses
input for that symbol.
• Backtracking parsers: Backtracking parsers are a type of top-down parser that can handle non-deterministic
grammar. When a parsing decision leads to a dead end, the parser can backtrack and try another alternative.
Backtracking parsers are not as efficient as other top-down parsers because they can potentially explore many
parsing paths.
• Non-backtracking parsers: Non-backtracking is a technique used in top-down parsing to ensure that the parser
doesn’t revisit already-explored paths in the parse tree during the parsing process. This is achieved by using a
predictive parsing table that is constructed in advance and selecting the appropriate production rule based on the
top non-terminal symbol on the parser’s stack and the next input symbol. By not backtracking, predictive parsers
are more efficient than other types of top-down parsers, although they may not be able to handle all grammar.
• Predictive parsers: Predictive parsers are top-down parsers that use a parsing to predict which production rule
to apply based on the next input symbol. Predictive parsers are also called LL parsers because they construct a
left-to-right, leftmost derivation of the input string.
• When using recursive descent parsing, the parser may need to backtrack when it encounters a symbol that does
not match the expected token. This can make the parsing process slower and less efficient.
A recursive descent parsing program has a set of procedures. There is one procedure for each of the non-terminal
present in the grammar. The parsing starts with the execution of the procedure meant for the starting symbol.
void A( ) {
Choose an A-production, A-> X1 X2 … Xk;
for (i = 1 to k) {
if (Xi is a nonterminal)
call procedure Xi();
else if (Xi = current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */;
}
}
Top- down parsers start from the root node (start symbol) and match the input string against the production
rules to replace them (if matched). To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e. ‘r’. The
very production of S (S → rXd) matches with it. So the top-down parser advances to the next input letter (i.e. ‘e’).
The parser tries to expand non-terminal ‘X’ and checks its production from the left (X → oa). It does not match with
the next input symbol. So the top-down parser backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.
Predictive Parsing
Predictive parsing is a simple form of recursive descent parsing. And it requires no backtracking. Instead, it can
determine which A-production must be chosen to derive the input string.
Predictive parsing chooses the correct A-production by looking ahead at the input string. It allows looking ahead
a fixed number of input symbols from the input string.
Input buffer and stack both contain the end marker ‘$’. It indicates the bottom of the stack and the end of the
input string in the input buffer.
1. The parser first considers the grammar symbol present on the top of the stack say ‘X’. And compares it with the
current input symbol say ‘a’ present in the input buffer.
o If X is a non-terminal then the parser chooses a product of X from the parse table, consulting the entry M [X, a].
o In case, X is a terminal then the parser checks it for a match with the current symbol ‘a’.
This is how predictive parsing identifies the correct production. So that it can successfully derive the input string.
LL Parsing
The LL parser is a predictive parser that doesn’t need backtracking. LL (1) parser accepts only LL (1) grammar.
• First L in LL (1) indicates that the parser scans the inputs string from left to right.
• Second L determines the leftmost derivation for the input string.
• The ‘1’ in LL (1) indicates that the parser lookahead only one input symbol from the input string.
LL (1) grammar does not include left recursion and there is no ambiguity in the LL (1) grammar.
Algorithm for construction of predictive parsing table:
Input : Grammar G
Method :
First()
F IRST () is a function that specifies the set of terminals that start a string derived from a production rule. It is the
first terminal that appears on the right-hand side of the production.
Example
Let us consider grammar to show how to find the first and follow in compiler design.
E->TE’
E’->+TE’/ε
T->FT’
T’->*FT’/ε
F->(ε)/id
Here,
Terminals are id, *, +, ε, (, )
Non-terminals are E, E’, T, T’, F
Now let’s try to find the first of ‘E’. here on the right-hand side of the production E->TE’ is T which is a non-
terminal but we have to find the terminals so to find terminals we move to the production T->FT’ in which the
first element is again a non-terminal, so we move to the third production F->(ε)/id in which the first element is a
terminal which will be the first of E.
So, First(E)={(, id}
Now let’s try to find the follow of ‘E’, to find this we find the production in which ‘E’ is on the right-hand side and
we get production which is F->(E)/id, so the follow will be the next non-terminal followed by the terminals which
are ‘)’ and in the follow ‘$’ is always added. So the follow(E)={$,)}
On repeating the above steps to find the first and follow in compiler design, we get
Left Recursion
A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first position on the right side
of production. It can eliminate left recursion by replacing a pair of production with
A → βA′
A → αA′|ϵ
Elimination of Left Recursion
In Left Recursive Grammar, expansion of A will generate Aα, Aαα, Aααα at each step, causing it to enter into an
infinite loop
A → Aα1|Aα2| … . |Aαm|β1|β2| … . . βn
can be replaced by
A → β1A′|β2A′| … . . | … . . |βnA′
A → α1A′|α2A′| … . . |αmA′|ε
E → E + T|T
T → T * F|F
F → (E)|id
E → E +T | T
A → A α | Β
∴ A = E, α = +T, β = T
T → T *F | F
A → A α | β
∴ A = T, α =∗ F, β = F
∴ A → β A′ means T → FT′
E → TE′
E′ → +TE′| ε
T → FT′
T →* FT′|ε
F → (E)| id
Example:
E→E+T|T
T→T*F|F
F→(E)|id
E’ → +TE’ | ε
T →FT’
T’ → *FT’ | ε
F → (E)|id
First( ) :
FIRST(E) = { ( ,id}
FIRST(E’) ={+ , ε}
FIRST(T) = { ( ,id}
FIRST(T’) = {*, ε}
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
The parsing table entries are single entries. So each location has not more than one entry. This type of grammar is
called LL(1)grammar.
S→iEtS | iEtSeS| a
E→b
factoring, we have
S→iEtSS’|a
S’→ eS | ε
E→b
FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
Since there are more than one production, the grammar is not LL(1) grammar
Shift Reduce parser attempts for the construction of parse in a similar manner as done in bottom-up parsing
i.e. the parse tree is constructed from leaves(bottom) to the root(up). A more general form of the shift -reduce
parser is the LR parser.
This parser requires some data structures i.e.
Basic Operations –
• Shift: This involves moving symbols from the input buffer onto the stack.
• Reduce: If the handle appears on top of the stack then, its reduction by using appropriate production rule is
done i.e. RHS of a production rule is popped out of a stack and LHS of a production rule is pushed onto the
stack.
• Accept: If only the start symbol is present in the stack and the input buffer is empty then, the parsing action is
called accept. When accepted action is obtained, it is means successful parsing is done.
• Error: This is the situation in which the parser can neither perform shift action nor reduce action and not even
accept action.
• HANDLES:
Always making progress by replacing a substring with LHS of a matching production will not lead tothe
goal/start symbol.
For example:
abbcde
aAbcde A b
aAAcde A b
struck
Informally, A Handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the reverse of a
right most derivation.
If the grammar is unambiguous, every right sentential form has exactly one handle.
More formally, A handle is a production A and a position in the current right-sentential form
such that:
S A /
HANDLE PRUNING:
Keep removing handles, replacing them with corresponding LHS of production, until we reach S.
Example:
E E+E/E*E/(E)/id
a+b*c a E id
E+b*c b E id
E+E*C C E id
The grammar is ambiguous, so there are actually two handles at next-to-last step. We can useparser-
generators that compute the handles for us.
$ (a,(a,a))$ Shift
$( a,(a,a))$ Shift
$ ( L, ( L ))$ Shift
$ ( L, ( L ) )$ Reduce S → (L)
$ ( L, S )$ Reduce L → L, S
$(L )$ Shift
$S $ Accept
Possible Conflicts:
Ambiguous grammars lead to parsing conflicts.
1. Shift-reduce: Both a shift action and a reduce action are possible in the same state (should weshift or
reduce)
Example: dangling-else problem
2. Reduce-reduce: Two or more distinct reduce actions are possible in the same state. (Whichproduction
should we reduce with 2).
Operator Grammar
A grammar is said to be an operator grammar if it follows these two properties:
$ ⋖ id+idxid$ Shift
$T ⋖ +idxid$ Shift
$T+Txid ⋗ $ Reduce by T → id
$T+TxT ⋗ $ Reduce by T → T x T
$T+T ⋗ $ Reduce by T → T + T
$T A $ Accept
Example 2
E->E+E/E*E/id
This is the directed graph representing the precedence function
Since there is no cycle in the graph, we can make this function table:
SLR (1) refers to simple LR Parsing. It is same as LR(0) parsing. The only difference is in the parsing table.To
construct SLR (1) parsing table, we use canonical collection of LR (0) item.
In the SLR (1) parsing, we place the reduce move only in the follow of left hand side.
SLR ( 1 ) Grammar
This construction requires FOLLOW of each non-terminal present in the grammar to be computed
The grammar that has a SLR parsing table is known as SLR(1) grammar. Generally, 1 is omitted
I1:
E’ → E.
E → E.+ T
I2:
E → T.
T → T .* F
I3:
T → F.
I4:
F → (.E)
E→.E+T
E → .T
T → .T * F
T → .F
F → .( E )
F → .id
I5:
F → id.
I6:
E → E + .T
T → .T * F
T → .F
F → .( E )
F → .id
I7:
T → T * .F
F → .( E)
F → .id
I8:
F → ( E .)
E → E. + T
I9:
E → E + T.
T → T. * F
I10:
T → T * F.
I11:
F → ( E ).
Syntax Error Handling:
If a compiler had to process only correct programs, its design & implementation would be greatly simplified.
But programmers frequently write incorrect programs, and a good compiler should assist the programmer
in identifying and locating errors.The programs contain errors at many different levels.
For example, errors can be:
Much of error detection and recovery in a compiler is centered around the syntax analysis phase. The
goals of error handler in a parser are:
• It should report the presence of errors clearly and accurately.
• It should recover from each error quickly enough to be able to detect subsequent errors.
• It should not significantly slow down the processing of correct programs.
YACC
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....
%token ID
Yacc also recognizes single characters as tokens. Therefore, assigned token numbers should no overlap
ASCII codes.
The definition part can include C code external to the definition of the parser and variable declarations,
within %{ and %} in the first column.
It can also include the specification of the starting symbol in the grammar:
%start nonterminal
Output Files:
• The output of YACC is a file named y.tab.c
If it contains the main() definition, it must be compiled to be executable.
Otherwise, the code can be an external function definition for the function int yyparse()
If called with the –d option in the command line, Yacc produces as output a header file y.tab.h with all its
specific definition (particularly important are token definitions to be included, for example, in a Lex input
file).
If called with the –v option, Yacc produces as output a file y.output containing a textual description of the
LALR(1) parsing table used by the parser. This is useful for tracking down how the parser solves conflicts.
Example: Yacc File (.y)
• C
%{
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double /* double type for yacc stack */
%}
%%
Lines : Lines S '\n' { printf("OK \n"); }
| S '\n’
| error '\n' {yyerror("Error: reenter last line:");
yyerrok; };
S : '(' S ')’
| '[' S ']’
| /* empty */ ;
%%
#include "lex.yy.c"
void yyerror(char * s)
/* yacc error handler */
{
fprintf (stderr, "%s\n", s);
}
int main(void)
{
return yyparse();
}
CANONICAL LR PARSING:
Example:
S → CC
C →CC/d.
1. Number the grammar productions:
1. S →CC
2. C →CC
3. C →d
I
S →S
S →CC
C →CC
C →d.
I
A=S
=
B=S
=a=$
Function closure tells us to add [B→.r,b] for each production B→r and terminal b in FIRST (a).
Now →r must be S→CC, and since is and a is $, b may only be $. Thus,
1
S→.CC,$
We continue to compute the closure by adding all items [C→.r,b] for b in FIRST [C$] i.e., matching
[S→.CC,$] against [A→.B,a] we have, A=S, =, B=C and a=$. FIRST (C$) = FIRST ©
FIRST© = {c,d} We add items:
C→.cC,C
C→cC,d
C→.d,c
C→.d,d
None of the new items have a non-terminal immediately to the right of the dot, so we have completed
our first set of LR(1) items. The initial I0 items are:
Now we start computing goto (I0,X) for various non-terminals i.e., Goto (I0,S):
Goto (I0,d)
I4 C→d., c/d→ reduced item.
Goto (I2,C) I5
S→CC.,$ → reduced item.
Goto (I2,C) I6
2
C→c.C,$
C→.cC,$
C→.d,$
Goto (I2,d) I7
C→d.,$ → reduced item.
Goto (I3,C) I8
C→cC.,c/d → reduced item.
Goto (I3,C) I3
C→c.C, c/d
C→.cC,c/d
C→.d,c/d
Goto (I3,d) I4
C→d.,c/d. → reduced item.
Goto (I6,C) I9
C→cC.,$ → reduced item.
Goto (I6,C) I6
C→c.C,$
C→,cC,$
C→.d,$
Goto (I6,d) I7
C→d.,$ → reduced item.
All are completely reduced. So now we construct the canonical LR(1) parsing table –
Here there is no neet to find FOLLOW ( ) set, as we have already taken look-a-head for each
set of productions while constructing the states.
3
2 S6 S7 5
3 S3 S4 8
4 R3 R3
5 R1
6 S6 S7 9
7 R3
8 R2 R2
9 R2
1. Consider I0 items:
The item C→.cC, $ gives rise to goto [I2,C] = I6. so action [0,C] = shift The item C→.d,$ gives rise
to goto [I2,d] = I7. so action [2,d] = shift 7
4. Consider I3 items:
The item C→.cC, c/d gives rise to goto [I3,C] = I8. so goto [3,C] = 8
The item C→.cC, c/d gives rise to goto [I3,C] = I3. so action [3,C] = shift 3. The item C→.d, c/d
gives rise to goto [I3,d] = I4. so action [3,d] = shift 4.
5. Consider I4 items:
The item C→.d, c/d is the reduced item, it is in I4 so set action [4,c/d] to reduce c→d. (production
rule no.3)
6. Consider I5 items:
The item S→CC.,$ is the reduced item, it is in I5 so set action [5,$] to S→CC (production rule no.1)
7. Consider I6 items:
4
The item C→c.C,$ gives rise to goto [I6 ,C] = I9. so goto [6,C] = 9
The item C→.cC,$ gives rise to goto [I6 ,C] = I6. so action [6,C] = shift 6
The item C→.d,$ gives rise to goto [I6 ,d] = I7. so action [6,d] = shift 7
8. Consider I7 items:
The item C→d., $ is the reduced item, it is in I7.
9. Consider I8 items:
The item C→CC.c/d in the reduced item, It is in Is, so set action[8,c/d] to reduce C→cd
(production rale no .2)
The item C →cC, $ is the reduced item, It is in I9, so set action [9,$] to reduce C→cC
(Production rale no.2)
If the Parsing action table has no multiply –defined entries, then the given grammar is called as
LR(1) grammar
LALR PARSING:
Example:
2. For each core present among the set of LR (1) items, find all sets having that core, and
replace there sets by their Union# (clus them into a single term)
I0 →same as previous
I1 → “
I2 → “
C →cC,c/d/$
C→cC,c/d/$
5
C→d,c/d/$
I5 →some as previous
I47 →C→d,c/d/$
Action Goto
State
c d C
Io S36 S47 2
1 Accept
2 S36 S47 5
36 S36 S47 89
47 r3 r3
5 r1
89 r2 r2 r2