Module-2 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

 Role of the Syntax Analyser – Syntax error handling.

 Review of Context Free Grammars - Derivation and Parse Trees,


 Eliminating Ambiguity.
 Basic parsing approaches - Eliminating left recursion, left factoring.
 Top-Down Parsing - Recursive Descent parsing, Predictive Parsing, LL(1)
Grammars.
SYNTAX ANALYSIS
 The second phase of compiler is syntax analyzer or parser.

 The parser receives a steam of tokens from the lexical analyzer and verifies that the string can
be generated by the grammar for the source language by constructing a parse tree.

 The term parsing comes from Latin word pars which means part of speech.
SYNTAX ANALYSIS Scanner
[Lexical Analyzer]

Tokens

Parser
[Syntax Analyzer]
INTERACTION BETWEEN LEXICAL ANALYZER
AND PARSER
CONTEXT FREE GRAMMAR (CFG)
 Context free grammar is a grammar whose productions are of the form

where A is a non terminal and α is a set of terminals and non terminals (α can be
empty also)

 A formal grammar is "context free" if its production rules can be applied regardless
of the context of a nonterminal.
 No matter which symbols surround it, the single nonterminal on the left hand side
can always be replaced by the right hand side.
 A CFG consist of (NTPS)

 Terminals
 basic symbols from which strings are formed
 tokens
 Non terminals
 nonterminals define sets of strings that help define the language generated by the
grammar
 Production
 Start Symbol
Grammar for simple arithmetic expression
DERIVATION
• A derivation is basically a sequence of production rules, in order to get the input
string.

• Beginning with the start symbol, each replaces a non terminal by the body of one of
its productions.

• Types:

• Left Most Derivation - In left most derivation, the left most non terminal is replaced in each step

• Right Most Derivation - In right most derivation, the right most non terminal is replaced in each
step
Consider the grammar
PARSE TREE
 Parse tree is a hierarchical structure which represents the derivation of the grammar to yield
input strings.

 Simply it is the graphical representation of derivations.

 Derivation tree

 Parsing is the process of determining if a string of token can be generated by a grammar.


 Yield of the parse tree

 The leaves of the parse tree are labeled by non-terminals or terminals and read
from left to right, they constitute a sentential form, called the yield or frontier of
the tree.
 Parsing is the process of determining if a string of token can be
generated by a grammar.

 2 approaches
 Top Down Parsing - In top down parsing, parse tree is constructed from top (root) to the
bottom (leaves).

 Bottom Up Parsing - In bottom up parsing, parse tree is constructed from bottom


(leaves)) to the top (root).
Top Down Parsing Bottom Up Parsing
 Top down parsing can be viewed as an attempt to find a
leftmost derivation for an input string (that is expanding the
leftmost terminal at every step).

 TDP approaches:

 Recursive Descent Parser

 Predictive Parser
RECURSIVE DESCENT PARSING
IMPLEMENTATION
 Procedure S()
{ if nextsymbol = ‘c’
{ A();
if nextsymbol = ‘d’
return success;
}  Procedure A()
} { if nextsymbol = ‘a’
{ if nextsymbol = ‘b’
return;
else return;
}
error;
}
 It is the most general form of top-down parsing.

 It may involve backtracking, that is making repeated scans of input, to


obtain the correct expansion of the leftmost non-terminal.

 Unless the grammar is ambiguous or left-recursive, it finds a suitable


parse tree
Drawbacks of RDP

 A left-recursive grammar can cause a recursive-descent parser, to go into an infinite loop. That is when
we try to expand A, we may find ourselves again trying to expanding A, without having consumed any
input.

 Recursive-descent parsers are not very common as programming language constructs can be parsed
without using backtracking.

 Not suitable with ambiguous grammar


24
PREDICTIVE PARSER
 Predictive parser has the capability to predict which alternative production is to
be used to replace the input string.

 A predictive parsing is a special form of recursive-descent parsing, in which


the current input token unambiguously determines the production to be applied
at each step.

 The goal of predictive parsing is to construct a top-down parser that never


backtracks.
 It is possible to build a non-recursive predictive parser by maintaining a stack explicitly, rather
than implicitly via recursive calls.

Model of non-recursive predictive parser


 Input buffer :
 contains the string to be parsed, followed by $(used to indicate end of input
string)

 Stack:
 initialized with $, to indicate bottom of stack.

 Parsing table:
 2 D array M[A,a] where A is a nonterminal and a is terminal or the symbol $

 The parser is controlled by a program.


28
//Reverse and push into stack
EXAMPLE:
Input : id + id * id
Grammar :
ETE’
E’ +TE’ | є
TFT’
T’*FT’ | є
F(E) | id

30
Moves made by predictive parser for the input id+id*id

31
 Uses 2 functions:
 FIRST()
 FOLLOW()
 These functions allows us to fill the entries of
predictive parsing table

32
RULES TO COMPUTE FIRST SET

1) If X is a terminal , then FIRST(X) is {X}


2) If X--> є then add є to FIRST(X)
3) If X is a non terminal and X-->Y1Y2Y3...Yn , then put 'a' in FIRST(X) if for some i,
a is in FIRST(Yi) and є is in all of FIRST(Y1),...FIRST(Yi-1).
35
FOLLOW

 FOLLOW is defined only for non terminals of the grammar G.


 It can be defined as the set of terminals of grammar G , which can
immediately follow the non terminal in a production rule from
start symbol.
 In other words, if A is a nonterminal, then FOLLOW(A) is the set of
terminals 'a' that can appear immediately to the right of A in some
sentential form

36
RULES TO COMPUTE FOLLOW SET

1. If S is the start symbol, then add $ to the


FOLLOW(S).

2. If there is a production rule A--> αBβ then


everything in FIRST(β) except for є is placed in
FOLLOW(B).

3. If there is a production A--> αB , or a production


A--> αBβ where FIRST(β) contains є then
everything in FOLLOW(A) is in FOLLOW(B).

37
38
 Calculate First and Follow of the given
grammar
S → aBDh
B → cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈
40
44
 A context-free grammar G , whose parsing table has no multiple entries is said to be LL(1).

 LL(l) grammars are the class of grammars from which the predictive parsers can be constructed

 In the name LL(1),

 the first L stands for scanning the input from left to right,

 the second L stands for producing a leftmost derivation,

 and the 1 stands for using one input symbol of lookahead at each step to make parsing
action decision.
Not LL(1)
Grammar
 The goal of predictive parsing is to construct a top-down parser that
never backtracks. To do so, we must transform a grammar in two ways:
 Eliminate Left Recursion
 Perform Left factoring

 These rules eliminate most common causes for backtracking


 The problem is that if we use this production for top-down derivation, we will fall into an
infinite derivation chain. This is called left recursion.

Eliminating Left Recursion


 The left-recursive pair of productions A  Aα|β could be replaced by two non-recursive
productions.
AMBIGUITY

An ambiguous sentence has two or more possible meanings within a single sentence or sequence
of words. This can confuse the reader and make the meaning of the sentence unclear.
AMBIGUOUS GRAMMAR
 An ambiguous grammar is one that produces more
than one leftmost or more than one rightmost
derivation for the same sentence.

 For most parsers, it is desirable that the grammar be


made unambiguous, for if it is not, we cannot
uniquely determine which parse tree to select for a
sentence.

You might also like