PCD - Unit Ii

UNIT II
Parsing Techniques: Need and Role of the Parser-Context Free Grammars, Top Down Parsing -General
Strategies, Recursive Descent Parser - Predictive Parser-LL(1) Parser, Bottom up Parsing – Shift Reduce
Parser-LR Parser - LR (0)Item, Construction of SLR Parsing Table. Introduction to Canonical LR(1) and LALR
Parser – operator Precedence Parsing - Error Handling and Recovery in Syntax Analyzer, YACC-Design of a
syntax Analyzer for a Sample Language
Need and Role of the Parser
Parser is a compiler that is used to break the data into smaller elements coming from lexical analysis phase.
In the syntax analysis phase, a compiler verifies whether or not the tokens generated by the lexical analyzer are
grouped according to the syntactic rules of the language. This is done by a parser. The parser obtains a string of
tokens from the lexical analyzer and verifies that the string can be the grammar for the source language. It detects
and reports any syntax errors and produces a parse tree from which intermediate code can be generated.
1. It verifies the structure generated by the tokens based on the grammar.

2. It constructs the parse tree.
3. It reports the errors.
4. It performs error recovery.
Context Free Grammars
Context free grammar is a formal grammar which is used to generate all possible strings in a given formal language.
Context free grammar G can be defined by four tuples as:
G= (V, T, P, S)
G describes the grammar
T describes a finite set of terminal symbols.

V describes a finite set of non-terminal symbols
P describes a set of production rules
S is the start symbol.
In CFG, the start symbol is used to derive the string. the string can be derived by repeatedly replacing a non-terminal
by the right hand side of the production, until all non-terminal have been replaced by terminal symbols.
Example:
Production rules:
S → aSa
S → bSb
S→c
Now check that abbcbba string can be derived from the given CFG.
S ⇒ aSa
S ⇒ abSba
S ⇒ abbSbba
S ⇒ abbcbba
By applying the production S → aSa, S → bSb recursively and finally applying the production S → c, string abbcbba
can be obtained.
Parsing:
Parsing is classified into two categories, i.e. Top-Down Parsing, and Bottom-Up Parsing. Top-Down Parsing is based
on Left Most Derivation whereas Bottom-Up Parsing is dependent on Reverse Right Most Derivation.
Top Down Parsing
Top-down parsing is a method of parsing the input string provided by the lexical analyzer. The top-down parser
parses the input string and then generates the parse tree for it.
Construction of the parse tree starts from the root node i.e. the start symbol of the grammar. Then using leftmost
derivation it derives a string that matches the input string.
In the top-down approach construction of the parse tree starts from the root node and end up creating the leaf
nodes. Here the leaf node presents the terminals that match the terminals of the input string.
• To derive the input string, first, a production in grammar with a start symbol is applied.
• Now at each step, the parser has to identify which production rule of a non-terminal must be applied in order to
derive the input string.
• The next step is to match the terminals in the production with the terminals of the input string.
Example of Top-Down Parsing
Consider the input string provided by the lexical analyzer is ‘abd’ for the following grammar.
S -> a A d
A -> b | b c
The top-down parser will parse the input string ‘abd’ and will start creating the parse tree with the starting
symbol ‘S‘.
Now the first input symbol ‘a‘ matches the first leaf node of the tree. So the parser will move ahead and find a
match for the second input symbol ‘b‘.
But the next leaf node of the tree is a non-terminal i.e., A, that has two productions. Here, the parser has to choose
the A-production that can derive the string ‘abc‘. So the parser identifies the A-production A-> b.
Now the next leaf node ‘b‘ matches the second input symbol ‘b‘. Further, the third input symbol ‘d‘ matches the
last leaf node ‘d‘ of the tree. Thereby successfully completing the top-down parsing
• Recursive-descent parsers: Recursive-descent parsers are a type of top-down parser that uses a set of recursive
procedures to parse the input. Each non-terminal symbol in the grammar corresponds to a procedure that parses
input for that symbol.
• Backtracking parsers: Backtracking parsers are a type of top-down parser that can handle non-deterministic
grammar. When a parsing decision leads to a dead end, the parser can backtrack and try another alternative.
Backtracking parsers are not as efficient as other top-down parsers because they can potentially explore many
parsing paths.
• Non-backtracking parsers: Non-backtracking is a technique used in top-down parsing to ensure that the parser
doesn’t revisit already-explored paths in the parse tree during the parsing process. This is achieved by using a
predictive parsing table that is constructed in advance and selecting the appropriate production rule based on the
top non-terminal symbol on the parser’s stack and the next input symbol. By not backtracking, predictive parsers
are more efficient than other types of top-down parsers, although they may not be able to handle all grammar.
• Predictive parsers: Predictive parsers are top-down parsers that use a parsing to predict which production rule
to apply based on the next input symbol. Predictive parsers are also called LL parsers because they construct a
left-to-right, leftmost derivation of the input string.
Advantages of Top-Down Parsing
• Top-down parsing is much simple.
• It is incredibly easy to identify the response action of the top-down parser.
Disadvantages of Top-Down Parsing
• Top-down parsing cannot handle left recursion in the grammar’s present.
• When using recursive descent parsing, the parser may need to backtrack when it encounters a symbol that does
not match the expected token. This can make the parsing process slower and less efficient.
Recursive Descent Parsing
A recursive descent parsing program has a set of procedures. There is one procedure for each of the non-terminal
present in the grammar. The parsing starts with the execution of the procedure meant for the starting symbol.
void A( ) {
Choose an A-production, A-> X1 X2 … Xk;
for (i = 1 to k) {
if (Xi is a nonterminal)
call procedure Xi();
else if (Xi = current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */;
}
}
Backtracking in Top-Down Parsing
Top- down parsers start from the root node (start symbol) and match the input string against the production
rules to replace them (if matched). To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e. ‘r’. The
very production of S (S → rXd) matches with it. So the top-down parser advances to the next input letter (i.e. ‘e’).
The parser tries to expand non-terminal ‘X’ and checks its production from the left (X → oa). It does not match with
the next input symbol. So the top-down parser backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.
Predictive Parsing
Predictive parsing is a simple form of recursive descent parsing. And it requires no backtracking. Instead, it can
determine which A-production must be chosen to derive the input string.
Predictive parsing chooses the correct A-production by looking ahead at the input string. It allows looking ahead
a fixed number of input symbols from the input string.
Predictive top-down parsing program maintains three components:

1. Stack: A predictive parser maintains a stack containing a sequence of grammar symbols.
2. Input Buffer: It contains the input string that the predictive parser has to parse.
3. Parsing Table: With the entries of this table it becomes easy for the top-down parser to choose the production to
be applied.
Input buffer and stack both contain the end marker ‘$’. It indicates the bottom of the stack and the end of the
input string in the input buffer.
Initially, the grammar symbol on the top of $ is the start symbol.
Steps to perform predictive parsing:
1. The parser first considers the grammar symbol present on the top of the stack say ‘X’. And compares it with the
current input symbol say ‘a’ present in the input buffer.
o If X is a non-terminal then the parser chooses a product of X from the parse table, consulting the entry M [X, a].
o In case, X is a terminal then the parser checks it for a match with the current symbol ‘a’.
This is how predictive parsing identifies the correct production. So that it can successfully derive the input string.
LL Parsing
The LL parser is a predictive parser that doesn’t need backtracking. LL (1) parser accepts only LL (1) grammar.
• First L in LL (1) indicates that the parser scans the inputs string from left to right.
• Second L determines the leftmost derivation for the input string.
• The ‘1’ in LL (1) indicates that the parser lookahead only one input symbol from the input string.
LL (1) grammar does not include left recursion and there is no ambiguity in the LL (1) grammar.
Algorithm for construction of predictive parsing table:
Input : Grammar G
Output : Parsing table M
Method :
1. For each production A → α of the grammar, do steps 2 and3.

2. For each terminal a in FIRST(α), add A → α to M[A,a].
3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If εis in FIRST(α)
and $ is in FOLLOW(A) , add A → α to M[A,$].
4. Make each undefined entry of M be error
FIRST and FOLLOW:
First()
F IRST () is a function that specifies the set of terminals that start a string derived from a production rule. It is the
first terminal that appears on the right-hand side of the production.
Rules to find First()

To find the first() of the grammar symbol, then we have to apply the following set of rules to the given grammar:-
• If X is a terminal, then First(X) is {X}.

• If X is a non-terminal and X tends to aα is production, then add ‘a’ to the first of X. if X->ε, then add null to the
First(X).
• If X_>YZ then if First(Y)=ε, then First(X) = { First(Y)-ε} U First(Z).
• If X->YZ, then if First(X)=Y, then First(Y)=teminal but null then First(X)=First(Y)=terminals.
Follow()
Follow () is a set of terminal symbols that can be displayed just to the right of the non-terminal symbol in any
sentence format. It is the first non-terminal appearing after the given terminal symbol on the right-hand side of
production.
Rules to find Follow()
To find the follow(A) of the grammar symbol, then we have to apply the following set of rules to the given
grammar:-
• $ is a follow of ‘S’(start symbol).
• If A->αBβ,β!=ε, then first(β) is in follow(B).
• If A->αB or A->αBβ where First(β)=ε, then everything in Follow(A) is a Follow(B).
Example
Let us consider grammar to show how to find the first and follow in compiler design.
E->TE’
E’->+TE’/ε
T->FT’
T’->*FT’/ε
F->(ε)/id
Here,
Terminals are id, *, +, ε, (, )
Non-terminals are E, E’, T, T’, F
Now let’s try to find the first of ‘E’. here on the right-hand side of the production E->TE’ is T which is a non-
terminal but we have to find the terminals so to find terminals we move to the production T->FT’ in which the
first element is again a non-terminal, so we move to the third production F->(ε)/id in which the first element is a
terminal which will be the first of E.
So, First(E)={(, id}
Now let’s try to find the follow of ‘E’, to find this we find the production in which ‘E’ is on the right-hand side and
we get production which is F->(E)/id, so the follow will be the next non-terminal followed by the terminals which
are ‘)’ and in the follow ‘$’ is always added. So the follow(E)={$,)}
On repeating the above steps to find the first and follow in compiler design, we get
Left Recursion
A Grammar G (V, T, P, S) is left recursive if it has a production in the form.
A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first position on the right side
of production. It can eliminate left recursion by replacing a pair of production with
A → βA′
A → αA′|ϵ
Elimination of Left Recursion
Left Recursion can be eliminated by introducing new non-terminal A such that.
This type of recursion is also called Immediate Left Recursion.
In Left Recursive Grammar, expansion of A will generate Aα, Aαα, Aααα at each step, causing it to enter into an
infinite loop
The general form for left recursion is
A → Aα1|Aα2| … . |Aαm|β1|β2| … . . βn
can be replaced by
A → β1A′|β2A′| … . . | … . . |βnA′
A → α1A′|α2A′| … . . |αmA′|ε
Example1 − Consider the Left Recursion from the Grammar.
E → E + T|T
T → T * F|F
F → (E)|id
Eliminate immediate left recursion from the Grammar.

Solution
Comparing E → E + T|T with A → A α |β
E → E +T | T
A → A α | Β
∴ A = E, α = +T, β = T
∴ A → A α |β is changed to A → βA′and A′ → α A′|ε
∴ A → βA′ means E → TE′
A′ → α A′|ε means E′ → +TE′|ε
Comparing T → T ∗ F|F with A → Aα|β
T → T *F | F
A → A α | β
∴ A = T, α =∗ F, β = F
∴ A → β A′ means T → FT′
A → α A′|ε means T′ →* FT′|ε
Production F → (E)|id does not have any left recursion
∴ Combining productions 1, 2, 3, 4, 5, we get
E → TE′
E′ → +TE′| ε
T → FT′
T →* FT′|ε
F → (E)| id
Example:
Consider the following grammar :
E→E+T|T
T→T*F|F
F→(E)|id
Track the moves made by predictive parser on the input id + id * id $
After eliminating left-recursion the grammar is E

→TE’
E’ → +TE’ | ε
T →FT’
T’ → *FT’ | ε
F → (E)|id
First( ) :
FIRST(E) = { ( ,id}
FIRST(E’) ={+ , ε}
FIRST(T) = { ( ,id}
FIRST(T’) = {*, ε}
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
Predictive parsing Table

LL(1) grammar:
The parsing table entries are single entries. So each location has not more than one entry. This type of grammar is
called LL(1)grammar.
Consider this following grammar:
S→iEtS | iEtSeS| a
E→b
After eliminating left
factoring, we have
S→iEtSS’|a
S’→ eS | ε
E→b
To construct a parsing table,
we need FIRST() and FOLLOW() for all the non - terminals.
FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
Since there are more than one production, the grammar is not LL(1) grammar
Shift Reduce parser attempts for the construction of parse in a similar manner as done in bottom-up parsing
i.e. the parse tree is constructed from leaves(bottom) to the root(up). A more general form of the shift -reduce
parser is the LR parser.
This parser requires some data structures i.e.
• An input buffer for storing the input string.

• A stack for storing and accessing the production rules.
Basic Operations –
• Shift: This involves moving symbols from the input buffer onto the stack.
• Reduce: If the handle appears on top of the stack then, its reduction by using appropriate production rule is
done i.e. RHS of a production rule is popped out of a stack and LHS of a production rule is pushed onto the
stack.
• Accept: If only the start symbol is present in the stack and the input buffer is empty then, the parsing action is
called accept. When accepted action is obtained, it is means successful parsing is done.
• Error: This is the situation in which the parser can neither perform shift action nor reduce action and not even
accept action.
• HANDLES:
Always making progress by replacing a substring with LHS of a matching production will not lead tothe
goal/start symbol.
For example:
abbcde
aAbcde A b
aAAcde A b
struck
Informally, A Handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the reverse of a
right most derivation.
If the grammar is unambiguous, every right sentential form has exactly one handle.
More formally, A handle is a production A and a position in the current right-sentential form
such that:
S A /
For example grammar, if current right-sentential form is

a/Abcde
Then the handle is A Ab at the marked position. ‘a’ never contains non-terminals.
HANDLE PRUNING:
Keep removing handles, replacing them with corresponding LHS of production, until we reach S.
Example:
E E+E/E*E/(E)/id
Right-sentential form Handle Reducing production
a+b*c a E id
E+b*c b E id
E+E*C C E id
E+E*E E*E E E*E
E+E E+E E E+E
The grammar is ambiguous, so there are actually two handles at next-to-last step. We can useparser-
generators that compute the handles for us.
Example 1 – Consider the grammar

S –> S + S
S –> S * S
S –> id
Perform Shift Reduce parsing for input string “id + id + id”.
Example 2 – Consider the grammar

S –> ( L ) | a
L –> L , S | S
Perform Shift Reduce parsing for input string “( a, ( a, a ) ) “.
Parsing
Stack Input Action
Buffer
$ (a,(a,a))$ Shift
$( a,(a,a))$ Shift
$(a ,(a,a))$ Reduce S → a
$(S ,(a,a))$ Reduce L → S
$(L ,(a,a))$ Shift
$(L, (a,a))$ Shift
$(L,( a,a))$ Shift
$(L,(a ,a))$ Reduce S → a
$(L,(S ,a))$ Reduce L → S
$(L,(L ,a))$ Shift
$(L,(L, a))$ Shift
$(L,(L,a ))$ Reduce S → a
$ ( L, ( L, S ))$ Reduce L →L, S
$ ( L, ( L ))$ Shift
$ ( L, ( L ) )$ Reduce S → (L)
$ ( L, S )$ Reduce L → L, S
$(L )$ Shift
$(L) $ Reduce S → (L)

Parsing
Stack Input Action
Buffer
$S $ Accept
Possible Conflicts:
Ambiguous grammars lead to parsing conflicts.
1. Shift-reduce: Both a shift action and a reduce action are possible in the same state (should weshift or
reduce)
Example: dangling-else problem
2. Reduce-reduce: Two or more distinct reduce actions are possible in the same state. (Whichproduction
should we reduce with 2).
Operator Grammar
A grammar is said to be an operator grammar if it follows these two properties:
There should be no ε (epsilon) on the right-hand side of any production.

There should be no two non-terminals adjacent to each other.
Examples of operator grammar are A + B, A - B, A x B, etc.
Operator Precedence Relations
There are three operator precedence relations. These are-
1. a ⋗ b: This relation implies that terminal “a” has higher precedence than terminal “b”.
2. a ⋗ b: This relation implies that terminal “a” has lower precedence than terminal “b”.
3. a ⋗ b: This relation implies that terminal “a” has equal precedence to terminal “b”.
Now, let’s see the steps to perform the operator precedence parsing.
Operator Precedence Steps
To perform the operator precedence parsing, follow the following steps.
1. Take a stack, put the $ symbol into it, and put the input string into the input buffer with the $ symbol at its end.
2. Check whether the given grammar is operator grammar or not. If not, convert it to operator grammar.
3. Draw the operator precedence relation table.
From the operator precedence relations discussed above, we will draw a table containing the precedence
relations of all the terminals present in our grammar.
4. Parse the input string.
1. Let “a” be the symbol at the top of the stack, and “b” be the current input symbol.
2. If a ⋖ b or a ≐ b, perform the shift operation by pushing “b” onto the stack.
3. Else if a ⋗ b, perform the reduce operation by popping the symbols out from the stack until the top of the stack
symbol has lower precedence than the current input symbol.
5. Repeat step 4. If the stack and the input buffer contain only the $ symbol, then accept the string; else, call an
error-handling routine.
6. If the string is accepted, generate the parse tree.
Now we will consider an example to understand the operator precedence parsing.
Examples
Example 1: Consider the following grammar
T → T+T/TxT/id
With the help of the above grammar, parse the input string “id+idxid”.
Stack Relation Input Buffer Parsing Action
$ ⋖ id+idxid$ Shift
$id ⋗ +idxid$ Reduce by T → id
$T ⋖ +idxid$ Shift
$T+ ⋖ idxid$ Shift
$T+id ⋗ xid$ Reduce by T → id
$T+T ⋖ xid$ Shift
$T+Tx ⋖ id$ Shift
$T+Txid ⋗ $ Reduce by T → id
$T+TxT ⋗ $ Reduce by T → T x T
$T+T ⋗ $ Reduce by T → T + T
$T A $ Accept
Example 2
E->E+E/E*E/id
This is the directed graph representing the precedence function
Since there is no cycle in the graph, we can make this function table:
fid -> g* -> f+ ->g+ -> f$
gid -> f* -> g* ->f+ -> g+ ->f$
SLR (1) Parsing
SLR (1) refers to simple LR Parsing. It is same as LR(0) parsing. The only difference is in the parsing table.To
construct SLR (1) parsing table, we use canonical collection of LR (0) item.
In the SLR (1) parsing, we place the reduce move only in the follow of left hand side.
Various steps involved in the SLR (1) Parsing:
o For the given input string write a context free grammar

o Check the ambiguity of the grammar
o Add Augment production in the given grammar
o Create Canonical collection of LR (0) items
o Draw a data flow diagram (DFA)
o Construct a SLR (1) parsing table
SLR ( 1 ) Grammar
For example for the given grammar

1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → ( E )
6. F → id
This construction requires FOLLOW of each non-terminal present in the grammar to be computed
The grammar that has a SLR parsing table is known as SLR(1) grammar. Generally, 1 is omitted
The canonical collection of SLR(0) items are

I0:
E’ → .E
E → .E + T
E → .T
T → .T * F
T → .F
F → .( E )
F → .id
I1:
E’ → E.
E → E.+ T
I2:
E → T.
T → T .* F
I3:
T → F.
I4:
F → (.E)
E→.E+T
E → .T
T → .T * F
T → .F
F → .( E )
F → .id
I5:
F → id.
I6:
E → E + .T
T → .T * F
T → .F
F → .( E )
F → .id
I7:
T → T * .F
F → .( E)
F → .id
I8:
F → ( E .)
E → E. + T
I9:
E → E + T.
T → T. * F
I10:
T → T * F.
I11:
F → ( E ).
Syntax Error Handling:
If a compiler had to process only correct programs, its design & implementation would be greatly simplified.
But programmers frequently write incorrect programs, and a good compiler should assist the programmer
in identifying and locating errors.The programs contain errors at many different levels.
For example, errors can be:
1) Lexical – such as misspelling an identifier, keyword or operator

2) Syntactic – such as an arithmetic expression with un-balanced parentheses.
3) Semantic – such as an operator applied to an incompatible operand.
4) Logical – such as an infinitely recursive call.
Much of error detection and recovery in a compiler is centered around the syntax analysis phase. The
goals of error handler in a parser are:
• It should report the presence of errors clearly and accurately.
• It should recover from each error quickly enough to be able to detect subsequent errors.
• It should not significantly slow down the processing of correct programs.
YACC
o YACC stands for Yet Another Compiler Compiler.

o YACC provides a tool to produce a parser for a given grammar.
o YACC is a program designed to compile a LALR (1) grammar.
o It is used to produce the source code of the syntactic analyzer of the language produced by LALR (1) grammar.
o The input of YACC is the rule or grammar and the output is a C program.
Input: A CFG- file.y
Output: A parser y.tab.c (yacc)
o The output file "file.output" contains the parsing tables.

o The file "file.tab.h" contains declarations.
o The parser called the yyparse ().
o Parser expects to use a function called yylex () to get tokens.
o
Input File: YACC input file is divided into three parts.
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....
Input File: Definition Part:

The definition part includes information about the tokens used in the syntax definition:
%token NUMBER
%token ID
Yacc automatically assigns numbers for tokens, but it can be overridden by

%token NUMBER 621
Yacc also recognizes single characters as tokens. Therefore, assigned token numbers should no overlap
ASCII codes.
The definition part can include C code external to the definition of the parser and variable declarations,
within %{ and %} in the first column.
It can also include the specification of the starting symbol in the grammar:
%start nonterminal
Input File: Rule Part:

The rules part contains grammar definitions in a modified BNF form.
Actions is C code in { } and can be embedded inside (Translation schemes).
Input File: Auxiliary Routines Part:
The auxiliary routines part is only C code.
It includes function definitions for every function needed in the rules part.
It can also contain the main() function definition if the parser is going to be run as a program.
The main() function must call the function yyparse().
Input File:
If yylex() is not defined in the auxiliary routines sections, then it should be included:
#include "lex.yy.c"
YACC input file generally finishes with:
.y
Output Files:
• The output of YACC is a file named y.tab.c
If it contains the main() definition, it must be compiled to be executable.
Otherwise, the code can be an external function definition for the function int yyparse()
If called with the –d option in the command line, Yacc produces as output a header file y.tab.h with all its
specific definition (particularly important are token definitions to be included, for example, in a Lex input
file).
If called with the –v option, Yacc produces as output a file y.output containing a textual description of the
LALR(1) parsing table used by the parser. This is useful for tracking down how the parser solves conflicts.
Example: Yacc File (.y)
• C
%{
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double /* double type for yacc stack */
%}
%%
Lines : Lines S '\n' { printf("OK \n"); }
| S '\n’
| error '\n' {yyerror("Error: reenter last line:");
yyerrok; };
S : '(' S ')’
| '[' S ']’
| /* empty */ ;
%%
#include "lex.yy.c"
void yyerror(char * s)
/* yacc error handler */
{
fprintf (stderr, "%s\n", s);
}
int main(void)
{
return yyparse();
}
CANONICAL LR PARSING:
Example:
S → CC
C →CC/d.
1. Number the grammar productions:
1. S →CC
2. C →CC
3. C →d
2. The Augmented grammar is:
I
S →S
S →CC
C →CC
C →d.
Constructing the sets of LR(1) items:

We begin with:
SI →.S,$ begin with look-a-head (LAH) as $.
We match the item [SI →.S,$] with the term [A →.B,a]

In the procedure closure, i.e.,
I
A=S
=
B=S
=a=$
Function closure tells us to add [B→.r,b] for each production B→r and terminal b in FIRST (a).
Now →r must be S→CC, and since  is  and a is $, b may only be $. Thus,
1
S→.CC,$
We continue to compute the closure by adding all items [C→.r,b] for b in FIRST [C$] i.e., matching
[S→.CC,$] against [A→.B,a] we have, A=S, =, B=C and a=$. FIRST (C$) = FIRST ©
FIRST© = {c,d} We add items:
C→.cC,C
C→cC,d
C→.d,c
C→.d,d
None of the new items have a non-terminal immediately to the right of the dot, so we have completed
our first set of LR(1) items. The initial I0 items are:
I0 : SI→.S,$ S→.CC,$ C→.CC,c/d C→.d.c/d
Now we start computing goto (I0,X) for various non-terminals i.e., Goto (I0,S):
I1 : SI→S.,$ → reduced item.

Goto (I0,C
I2 : S→C.C, $
C→.cC,$
C→.d,$
Goto (I0,C :
I2 : C→c.C,c/d
C→.cC,c/d
C→.d,c/d
Goto (I0,d)
I4 C→d., c/d→ reduced item.
Goto (I2,C) I5
S→CC.,$ → reduced item.
Goto (I2,C) I6
2
C→c.C,$
C→.cC,$
C→.d,$
Goto (I2,d) I7
C→d.,$ → reduced item.
Goto (I3,C) I8
C→cC.,c/d → reduced item.
Goto (I3,C) I3
C→c.C, c/d
C→.cC,c/d
C→.d,c/d
Goto (I3,d) I4
C→d.,c/d. → reduced item.
Goto (I6,C) I9
C→cC.,$ → reduced item.
Goto (I6,C) I6
C→c.C,$
C→,cC,$
C→.d,$
Goto (I6,d) I7
C→d.,$ → reduced item.
All are completely reduced. So now we construct the canonical LR(1) parsing table –
Here there is no neet to find FOLLOW ( ) set, as we have already taken look-a-head for each
set of productions while constructing the states.
Constructing LR(1) Parsing table:

Action goto
State C D $ S C
I0 S3 S4 1 2
1 Accept
3
2 S6 S7 5
3 S3 S4 8
4 R3 R3
5 R1
6 S6 S7 9
7 R3
8 R2 R2
9 R2
1. Consider I0 items:
The item S→.S.$ gives rise to goto [I0,S] = I1 so goto [0,s] = 1.

The item S→.CC, $ gives rise to goto [I0,C] = I2 so goto [0,C] = 2.
The item C→.cC, c/d gives rise to goto [I0,C] = I3 so goto [0,C] = shift 3
The item C→.d, c/d gives rise to goto [I0,d] = I4 so goto [0,d] = shift 4
The item SI→S.,$ is in I1, then set action [1,$] = accept

The item S→C.C,$ gives rise to goto [I2,C] = I5. so goto [2,C] = 5
The item C→.cC, $ gives rise to goto [I2,C] = I6. so action [0,C] = shift The item C→.d,$ gives rise
to goto [I2,d] = I7. so action [2,d] = shift 7
The item C→.cC, c/d gives rise to goto [I3,C] = I8. so goto [3,C] = 8
The item C→.cC, c/d gives rise to goto [I3,C] = I3. so action [3,C] = shift 3. The item C→.d, c/d
gives rise to goto [I3,d] = I4. so action [3,d] = shift 4.
The item C→.d, c/d is the reduced item, it is in I4 so set action [4,c/d] to reduce c→d. (production
rule no.3)
The item S→CC.,$ is the reduced item, it is in I5 so set action [5,$] to S→CC (production rule no.1)
4
The item C→c.C,$ gives rise to goto [I6 ,C] = I9. so goto [6,C] = 9
The item C→.cC,$ gives rise to goto [I6 ,C] = I6. so action [6,C] = shift 6
The item C→.d,$ gives rise to goto [I6 ,d] = I7. so action [6,d] = shift 7
The item C→d., $ is the reduced item, it is in I7.
So set action [7,$] to reduce C→d (production no.3)
The item C→CC.c/d in the reduced item, It is in Is, so set action[8,c/d] to reduce C→cd
(production rale no .2)
The item C →cC, $ is the reduced item, It is in I9, so set action [9,$] to reduce C→cC
(Production rale no.2)
If the Parsing action table has no multiply –defined entries, then the given grammar is called as
LR(1) grammar
LALR PARSING:
Example:
1. Construct C={I0,I1,… ........ ,In} The collection of sets of LR(1) items
2. For each core present among the set of LR (1) items, find all sets having that core, and
replace there sets by their Union# (clus them into a single term)
I0 →same as previous
I1 → “
I2 → “
I36 – Clubbing item I3 and I6 into one I36 item.
C →cC,c/d/$
C→cC,c/d/$
5
C→d,c/d/$
I5 →some as previous
I47 →C→d,c/d/$
I89 →C→cC, c/d/$
LALR Parsing table construction:
Action Goto
State
c d C
Io S36 S47 2
1 Accept
2 S36 S47 5
36 S36 S47 89
47 r3 r3
5 r1
89 r2 r2 r2

PCD - Unit Ii

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PCD - Unit Ii

Uploaded by

Copyright:

Available Formats

UNIT II

Need and Role of the Parser

1. It verifies the structure generated by the tokens based on the grammar.

Context Free Grammars

Context free grammar G can be defined by four tuples as:

G describes the grammar

T describes a finite set of terminal symbols.

P describes a set of production rules

S is the start symbol.

Top Down Parsing

Example of Top-Down Parsing

Advantages of Top-Down Parsing

• Top-down parsing is much simple.

• It is incredibly easy to identify the response action of the top-down parser.

Disadvantages of Top-Down Parsing

• Top-down parsing cannot handle left recursion in the grammar’s present.

Recursive Descent Parsing

Backtracking in Top-Down Parsing

Predictive top-down parsing program maintains three components:

Initially, the grammar symbol on the top of $ is the start symbol.

Steps to perform predictive parsing:

Output : Parsing table M

1. For each production A → α of the grammar, do steps 2 and3.

Rules to find First()

• If X is a terminal, then First(X) is {X}.

A Grammar G (V, T, P, S) is left recursive if it has a production in the form.

Left Recursion can be eliminated by introducing new non-terminal A such that.

This type of recursion is also called Immediate Left Recursion.

The general form for left recursion is

Example1 − Consider the Left Recursion from the Grammar.

Eliminate immediate left recursion from the Grammar.

Comparing E → E + T|T with A → A α |β

∴ A → A α |β is changed to A → βA′and A′ → α A′|ε

∴ A → βA′ means E → TE′

A′ → α A′|ε means E′ → +TE′|ε

Comparing T → T ∗ F|F with A → Aα|β

A → α A′|ε means T′ →* FT′|ε

Production F → (E)|id does not have any left recursion

∴ Combining productions 1, 2, 3, 4, 5, we get

Consider the following grammar :

Track the moves made by predictive parser on the input id + id * id $

After eliminating left-recursion the grammar is E

Predictive parsing Table

Consider this following grammar:

After eliminating left

To construct a parsing table,

we need FIRST() and FOLLOW() for all the non - terminals.

• An input buffer for storing the input string.

For example grammar, if current right-sentential form is

Right-sentential form Handle Reducing production

E+E*E E*E E E*E

E+E E+E E E+E

Example 1 – Consider the grammar

Example 2 – Consider the grammar

$(a ,(a,a))$ Reduce S → a

$(S ,(a,a))$ Reduce L → S

$(L ,(a,a))$ Shift

$(L, (a,a))$ Shift

$(L,( a,a))$ Shift

$(L,(a ,a))$ Reduce S → a

$(L,(S ,a))$ Reduce L → S

$(L,(L ,a))$ Shift

$(L,(L, a))$ Shift

E+EE EE E E*E