Unit 2 - Sessions 1 - 2

18CSC304J
COMPILER DESIGN
UNIT 2
SESSIONS 1 & 2
Topics that will be covered in this
Session
• Syntax Analysis Definition

• Role of Parser
• Context Free Grammar
• Lexical versus Syntactic Analysis
• Syntax Error Handling
SYNTAX ANALYSIS
DEFINITION
Syntax Analysis
• Syntax Analysis is the second phase of the compiler design process
• It analyzes the syntactical structure
• It checks if the given input is in the correct syntax of the programming language or not
• Every programming language has rules that prescribe the syntactic structure of well-formed
programs
• In C, for example, a program is made up of functions, a function out of declarations and

statements, a statement our of expressions, and so on.
• The syntax of programming language constructs can be specified by context-free grammars or BNF
(Backus-Naur Form) notation
• Grammars offer significant benefits for both language designers and compiler writers
Benefits of Grammar
• A grammar gives a precise, yet easy-to-understand, syntactic specification of a programming
language
• From certain classes of grammars, we can construct automatically an efficient parser that
determines the syntactic structure of a source program
• The parser construction process can reveal syntactic ambiguities and trouble spots that might have
slipped through the initial design phase of a language
• The structure imparted to a language by a properly designed grammar is useful for translating
source programs into correct object code and for detecting errors
• A grammar allows a language to be evolved or developed iteratively, by adding new constructs to

perform new tasks
• These new constructs can be integrated more easily into an implementation that follows the
grammatical structure of the language
ROLE OF THE PARSER
The Role of Parser
• Input : Stream of tokens
• Output : Some representation of
the parse tree
Parsing Methods:
• Universal Parsing
• It obtains a string of tokens from the lexical analyzer • Cocke-Younger-Kasami algorithm
and verifies that the string can be generated by the and Earley’s algorithm
grammar for the source language
• Can parse any grammar
• The parser should also report syntax errors in an • Too inefficient to use in compilers
intelligible fashion
• Top-down Parsing commonly
• It should also recover from commonly occurring
errors • Bottom-up Parsing used in compilers
The Role of the Parser – cont..
Top-down Parsing
• Top-down parsers build parse trees from the top (root) to the bottom (leaves)
Bottom-up Parsing
• Bottom-up parsers start from the leaves and work up to the root
Note
• The input to the parser is always scanned from left to right, one symbol at a time
• The most efficient top-down and bottom-up methods work only for sub-classes of grammars
• LL and LR grammars are expensive enough to describe most of the syntactic constructs in modern
programming languages
• Parsers implemented by hand often use LL grammars (eg. Predictive Parsing approach)
• Parsers for the larger class of LR grammars are usually constructed using automated tools
The Role of the Parser – cont..
Tasks that may be conducted during parsing
• Collecting information about various tokens into the symbol table
• Performing type checking and other kinds of semantic analysis
• Generating intermediate code
These activities are lumped into the “Rest of the front end” box in the picture
CONTEXT FREE
GRAMMAR
Grammar & its Types
• Grammar denotes syntactical rules in languages
• Noam Chomsky gave a mathematical model for grammar. According to him there are 4 types of
grammars
Gramma Grammar Language Accepted Automation

r
Type
Type 0 Unrestricted Recursively Turing Machine
Grammar Enumerable
Language
Type 1 Context Sensitive Context Sensitive Linear Bounded
Grammar Language Automata
Type 2 Context Free Context Free Pushdown
Grammar Language Automata
Type 3 Regular Grammar Regular Language Finite Automata
(or) Regular
Expression
Context Free Grammar
• A Context-Free Grammar is used to systematically describe the syntax of programming language
constructs like expressions and statements
• A context-free grammar (grammar) consists of terminals, non-terminals, a start symbol and

productions
1. Terminals are the basic symbols from which strings are formed
2. Non-terminals are syntactic variables that denote sets of strings
3. In a grammar, one non-terminal is distinguished as the start symbol, and the set of strings it
denotes is the language generated by the grammar. Conventionally, the productions for the
start symbol are listed first
4. The productions of a grammar specify the manner in which the terminals and non-terminals
can be combined to form strings.
Context Free Grammar :
Example
expression → expression + term E→E+T E→E+T|E–T|T
expression → expression – term E→E–T T→T*F|T/F|F
expression → term E→T F → ( E ) | id
term → term * factor T→T*F
term → term / factor T→T/F

O O
term → factor T→F
R R
factor → ( expression ) F→(E)
factor → id F → id
Non-terminals: Non-terminals: Non-terminals:
expression, term, factor E, T, F E, T, F
Start Symbol : expression Start Symbol : E Start Symbol : E

Notational Conventions
1 2
These symbols are terminals: These symbols are non-terminals:
• Lowercase letters early in the alphabet, such • Uppercase letters early in the alphabet,
as a, b, c such as A, B, C
• Operator symbols such as +, *, and so on • The letter S, which when it appears, is

usually the start symbol
• Punctuation symbols such as parentheses,
comma, and so on • Lowercase, italic names such as expr or
stmt
• The digits 0, 1, 2, … , 9
• Boldface strings such as id or if, each of which

represents a single terminal symbol
Notational Conventions – cont.. 5
Lowercase Greek letters, , β, γ represent
3
strings of grammar symbols. A generic
Uppercase letters late in the alphabet, such as production can be written as A → 
X, Y, Z, represent grammar symbols, that is,
either non-terminals or terminals
6
4 A set of productions A → 1, A → 2, A → 3
with a common head A (call them A-
Lowercase letters late in the alphabet, such as
productions) may be written as
u, v, … z represent strings of terminals
A → 1 | 2 | 3
1 2 3 are the alternatives of A

7
Unless stated otherwise, the head of the first
production is the start symbol
Derivations from a Grammar
• A derivation of a string from a grammar is applying a sequence of productions that transform the
start symbol into the string
• A derivation proves that a string belongs to the language defined by a grammar
• A parse tree can be constructed with the help of a derivation
• A parse tree is a graphical representation of a derivation
• If at each step in a derivation, a production is applied to the left most non-terminal, then the
derivation is called the leftmost derivation
• A derivation in which the rightmost non-terminal is replaced at each step is called the rightmost
derivation
Derivations – Example 1
Consider the grammar
S → ABC
A → aA | a
B → bB | b
C → cC | c
Derive the string aabbbcccc

Derivations – Example 2
Derivations – cont..
Ambiguous Grammar
• Every parse tree has associated with it a unique leftmost and rightmost derivation
• A grammar that produces more than one parse tree for some sentence is said to be ambiguous
• Put another way, an ambiguous grammar is one that produces more than one leftmost derivation
or more than one rightmost derivation for the same sentence
Consider the grammar E → E + E | E * E | ( E ) | id
Check whether the grammar is ambiguous or not
Let us consider the string id + id * id
As we are able to draw two parse trees for the given string, the grammar is ambiguous
LEXICAL vs SYNTAX
ANALYSIS
Context Free Grammars Vs Regular
Expressions
• Grammars are more powerful notation than regular expressions
• Every construct that can be described by a regular expression can be described by a grammar, but
not vice-versa
• Every regular language is a context-free language, but not vice versa

Lexical Vs Syntax Analysis
Everything that can be described by a regular expression can also be described by a grammar. We
may therefore ask “Why use regular expressions to define the lexical syntax of a language?”
There are several reasons
• Separating the syntactic structure of a language into lexical and non-lexical parts provides a
convenient way of modularizing the front end of a compiler into two manageable-sized
components
• The lexical rules of a language are frequently quite simple, and to describe them we do not need a
notation as powerful as grammars
• Regular expressions generally provide a more concise and easier-to-understand notation for
tokens than grammars
• More efficient lexical analyzers can be constructed automatically from regular expressions than
from arbitrary grammars
• Regular expressions are most useful for describing the structure of constructs such as identifiers,
constants, keywords, and white space. Grammars, on the other hand, are most useful for
describing nested structures such as balanced parentheses, matching begin-end’s, corresponding
if-then-else’s, and so on. These nested structures cannot be described by regular expressions
Lexical Analysis
Vs
Syntax Analysis
SYNTAX ERROR
HANDLING
Syntax Error Handling
• If a compiler had to process only correct programs, its design and implementation would be
simplified greatly
• However, a compiler is expected to assist the programmer in locating and tracking down errors that
inevitably creep into programs, despite the programmer’s best efforts
• Most programming language specifications do not describe how a compiler should respond to
errors; error handling is left to the compiler designer
• Planning the error handling right from the start can both simplify the structure of a compiler and
improve its handling of errors
Common Programming Errors
Common programming errors can occur at many different levels
• Lexical errors include misspellings of identifiers, keywords, or operators, and missing quotes around
text intended as a string
• Syntactic errors include misplaced semicolons or extra or missing braces, that is, { or }
Another example in C is the appearance of a case statement without an enclosing switch
• Semantic errors include type mismatches between operators and operands, e.g., the return of a
value from a function in C with return type void
• Logical errors can be anything from incorrect reasoning on the part of the programmer to the use
in a C program of the assignment operator = instead of the comparison operator ==. The program
containing = may be well formed; however, it may not reflect the programmer’s intent
Error Recovery during Parsing / Syntax
Analysis
• The precision of parsing methods allows syntactic errors to be detected very efficiently
• Several parsing methods, such as the LL and LR methods, detect an error as soon as possible; that
is, when the stream of tokens from the lexical analyzer cannot be parsed further according to the
grammar for the language.
• They have the viable-prefix property, meaning that they detect that an error has occurred as soon
as they see a prefix of the input that cannot be completed to form a string in the language
• Error recovery is emphasized during parsing because many errors appear syntactic and are exposed
when parsing cannot continue. A few semantic errors such as type mismatches, can also be
detected efficiently; however, accurate detection of semantic and logical errors at compile time is in
general a difficult task
Goals of an Error Handler
The goals of an error handler in a parser is simple to state, but challenging to realize:
• Report the presence of errors clearly and accurately
• Recover from each error quickly enough to detect subsequent errors
• Add minimal overhead to the processing of correct programs
How should an error handler report the presence of an error?
• It must report the place in the source program where an error is detected, because there is a good
chance that the actual error occurred within the previous few tokens
• A common strategy is to print the offending line with a pointer to the position at which an error is
detected
Error Recovery Strategies
Once an error is detected, how should the parser recover?
• There is no universally acceptable strategy, but a few methods have broad applicability
• The simplest approach is for the parser to quit with an informative error message when it detects
the first error
• Additional errors are often uncovered if the parser can restore itself to a state where processing of
the input can continue with reasonable hopes that further processing will provide meaningful
diagnostic information
• If errors pile up, it is better for the compiler to give up after exceeding some error limit than to
produce an annoying avalanche of “spurious” errors
Error Recovery Strategies –
cont..
Recovery strategies
• Panic-Mode Recovery
• Phrase-Level Recovery
• Error Productions
• Global Correction
cont..
Panic-Mode Recovery
• On discovering an error, the parser discards input symbols one at a time until one of a designated
set of synchronizing tokens is found
• The synchronizing tokens are usually delimiters, such as semicolon or }, whose role in the source
program is clear and unambiguous
• The compiler designer must select the synchronizing tokens appropriate for the source language
• Panic-mode recovery often skips a considerable amount of input without checking it for additional
errors.
Advantages:
• Simplicity
• It is guaranteed not to go into an infinite loop

cont..
Phrase-Level Recovery
• On discovering an error, the parser may perform local correction on the remaining input
• It may replace a prefix of the remaining input by some string that allows the parser to continue
• Examples for local correction
• Replace a comma by a semicolon
• Delete an extraneous semicolon
• Insert a missing semicolon
• The choice of the local correction is left to the compiler designer
• We must be careful to choose replacements that do not lead to infinite loops
• It is used in several error-repairing compilers, as it can correct any input string
Drawback
• Difficulty in coping with situations in which the actual error has occurred before the point of
detection
cont..
Error Productions
• By anticipating common errors that might be encountered, we can augment the grammar for the
language at hand with productions that generate the erroneous constructs
• A parser constructed from a grammar augmented by these error productions detects the
anticipated errors when an error production is used during parsing
• The parser can generate appropriate error diagnostics about the erroneous construct that has been
recognized in the input
cont..
Global Correction
• Ideally, we would like a compiler to make as few changes as possible in processing an incorrect
input string
• There are algorithms for choosing a minimal sequence of changes to obtain a globally least-cost
correction
• Given an incorrect input string x and grammar G, these algorithms will find a parse tree for a
related string y, such that the number of insertions, deletions, and changes of tokens required to
transform x into y is as small as possible
• Unfortunately, these methods are in general too costly to implement in terms of time and space
• So these techniques are currently only of theoretical interest
• Note: A closest correct program may not be what the programmer had in mind
18CSC304J
COMPILER DESIGN
UNIT 2
SESSION 3
Session
• Elimination of Ambiguity
• Elimination of Left Recursion
• Left Factoring
ELIMINATION OF
AMBIGUITY
Eliminating Ambiguity
• There exists no general algorithm to remove ambiguity from grammar
• To check a grammar for ambiguity, we try finding a string that has more than one parse tree. If
any such string exists, then the grammar is ambiguous
• Causes such as left recursion, common prefixes etc. makes the grammar ambiguous
• The removal of these causes may convert the grammar into unambiguous grammar
• However, it is not always compulsory
• Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity

Eliminating Ambiguity –
Example 1
Eliminate ambiguity from the “dangling-else” grammar
• Here “other” stands for any other statement
• Parse tree for the expression

Eliminating Ambiguity – Example 1
– cont…
• Consider the expression
Parse Tree
1
Parse Tree
2
As there are two parse trees for the given
expression, the grammar is ambiguous

Eliminating Ambiguity – Example 1 –
cont..
• We can rewrite the dangling-else grammar as the following unambiguous grammar
• The idea is that a statement appearing between a then and an else must be matched
• That is, an interior statement must not end with an unmatched or open then
• A matched statement is either an if-then-else statement containing no open statements or it is

any other kind of unconditional statement
Eliminating Ambiguity – Example 1 –
cont..
• Now the expression has only one parse tree

Eliminating Ambiguity –
Example 2
Eliminate ambiguity from the “expression” grammar
E → E + E | E * E | ( E ) | id
The reason for ambiguity in this grammar is
• The precedence and associative rules are not imposed in the grammar
We can rewrite the grammar by imposing precedence and associative rules as follows
E→E+T|T
T→T*F|F
F → ( E ) | id
ELIMINATION OF LEFT
RECURSION
Eliminating Left Recursion
• A grammar is left recursive if it has a nonterminal A such that there is a derivation
• A production in which the leftmost symbol on the right side is the same as the nonterminal on the
left side of the production is called a left-recursive production
Eg : E → E + T
• Top down parsing methods cannot handle left-recursive grammars
• So, a transformation is needed to eliminate left recursion
• Left recursion can be eliminated by rewriting the grammar

Rule for Eliminating Immediate Left
Recursion
• Suppose we have the following productions:
A → A1 | A2 | … | Am | β1 | β2 | … | βn
• To eliminate left recursion, we can rewrite the grammar as follows
A → β1 A′| β2 A′| … | βn A′
A′ → 1 A′ | 2 A′ | …. | m A′ | ε
• This procedure eliminates all immediate left recursion from the A and A′ productions, but it does
not eliminate left recursion involving derivations of 2 or more steps
Eliminating Immediate Left Recursion –
Example
• Eliminate left recursion from the given grammar
E→E+T|T
T→T*F|F
F → ( E ) | id
After eliminating the immediate left recursion, we get
E → T E′
E′ → + T E′ | ε
T → F T′
T′ → * F T′ | ε
F → ( E ) | id
Eliminating Left Recursion Involving
Derivations
Eliminating Left Recursion Involving Derivations –
Example 1
Eliminate left recursion from the given i=2
grammar
Substitute S productions in A
S → Aa | b
A → Ac | Aad | bd | ε
A → Ac | Sd | ε
Eliminate immediate left recursion in A
Step 1 : Order the non-terminals
A → bdA′ | A′
1–S
A′ → cA′ | adA′ | ε
2–A
The grammar after eliminating left recursion
i=1 is
Check if there is immediate left recursion in S → Aa | b

S. If so eliminate it
A → bdA′ | A′
There is no immediate left recursion in S
A′ → cA′ | adA′ | ε
Eliminating Left Recursion Involving Derivations –
Example 2
Eliminate left recursion from the given i=2
grammar Substitute S productions in L (only where it
S→(L)|a starts with S)
L→L,S|S L→L,S|(L)|a
Eliminate immediate left recursion in L
Step 1 : Order the non-terminals
L → ( L ) L′ | a L′
1–S
L′ → , S L′ | ε
2–L
The grammar after eliminating left recursion
i=1 is
Check if there is immediate left recursion in S→(L)|a
S. If so eliminate it
L → ( L ) L′ | a L′
There is no immediate left recursion in S
L′ → , S L′ | ε
LEFT FACTORING
Left Factoring
• Left Factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing
• Left Factoring will be done when more than one production of a non-terminal has the same prefix
(common prefix)
• The basic idea is that when it is not clear which of the two alternative productions to use to expand
a non terminal A, we may be able to rewrite the A-productions to defer the decision until we have
seen enough of the input to make the right choice
• For example, in the above grammar, on seeing the input if, we cannot immediately tell which
production to choose to expand stmt
Left Factoring - Algorithm
Left Factoring – Examples
Example 1 Example 2
Left factor the grammar given below: Left factor the grammar given below:
S →i E t S e S | i E t S | a A→aAB|aBc|aAc
E→b
The grammar after left factoring is
S → i E t S S′ | a A → a A′
S′ → e S | ε A′ → A B | B c | A c
E→b
Left Factoring – Examples –
cont..
Example 3 Example 4
Left factor the grammar given below: Left factor the grammar given below:
S→cAd S→aSa|aSb|a|b
A→ab|a
S→cAd S → a S′ | b
A → a A′ S′ → S a | S b | ε
A′ → b | ε
18CSC304J
COMPILER DESIGN
UNIT 2
SESSION 6
Session
• Top Down Parsing

• Recursive Descent Parsing
• Backtracking
TOP DOWN PARSING
Top Down Parsing
• Top-down parsing can be viewed as the problem of constructing a parse tree for the input string,
starting from the root and creating the nodes of the parse tree in preorder (depth-first)
• Top-down parsing can also be viewed as finding a leftmost derivation for an input string
• At each step of a top-down parsing, the key problem is that of determining the production to be
applied for a nonterminal, say A.
• Once an A-production is chosen, the rest of the parsing process consists of matching the terminal
symbols in the production body with the input string
Top Down Parsing - Example
• Consider the grammar
• Construct parse tree for the string id + id * id

Top Down
Parsing
Example
id + id * id
Top Down Parsing – Cont..
• Recursive-descent parsing is a general form of top-down parsing
• It may require backtracking to find the correct A-production to be applied
• Predictive parsing is a special case of recursive-descent parsing, where no backtracking is required
• Predictive parsing chooses the correct A-production by looking ahead at the input a fixed number
of symbols
• Typically, we may look only at one (that is, the next input symbol)
• In the previous example, at the first E′ node, the production E′ → + T E′, at the second E′ node, the
production E′ → ε is chosen
• A predictive parser can choose between E′ productions by looking at the next input symbol
• The class of grammars for which we can construct predictive parsers looking k symbols ahead in
the input is sometimes called the LL(k) grammar
RECURSIVE DESCENT
PARSING &
BACKTRACKING
Recursive-Descent Parsing
• Recursive-descent parsing is a general form of top-down parsing, that may involve backtracking,
i.e., making repeated scans of the input
• When we cannot choose a unique A-production, we must try each of several productions in some
order
• Only if there are no more A-productions to try, we declare that an input error has been found
• However, backtracking is rarely needed to parse programming language constructs, so backtracking

is rarely needed to parse programming language constructs
• So backtracking parsers are not seen frequently
NOTE:
• A left-recursive grammar can cause a recursive-descent parser, even one with backtracking, to go
into an infinite loop. That is, when we try to expand a nonterminal A, we may eventually find
ourselves again trying to expand A without having consumed any input
Recursive-Descent Parsing -
Example
• To construct a parse tree top-down for the input string w = cad, begin with a tree consisting of a
single node labeled S, and the input pointer pointing to c, the first symbol of w
• S has only one production, so we use it to expand S and obtain the tree
• The leftmost leaf, labeled c, matches the first symbol of input w
• So we advance the input pointer to a, the second symbol of w
• Consider the next leaf labeled A. Now we expand A using the first alternative A → a b
Recursive-Descent Parsing -
Example
• We have a match for the second input symbol, a, so we advance the input pointer to d, the third
input symbol, and compare d against the next leaf, labeled b
• Since b does not match d, we report failure and go back to A to see whether there is another
alternative for A that has not been tried, but that might produce a match
• Go back to A and reset the input pointer to pos 2. The 2nd alternative for A produces the tree
• The leaf a matches the second symbol of w and the leaf d matches the third
symbol. Since we have produced a parse tree for w, we halt and announce
successful completion of parsing
Recursive-Descent Parsing –
Example 2
S→(L)|a
L → ( L ) L′ | a L′
L′ → , S L′ | ε
• Show how recursive-descent parsing will work for the string ( a , a )

18CSC304J
COMPILER DESIGN
UNIT 2
SESSION 7
Session
• Computation of FIRST
• Problems related to FIRST
COMPUTATION OF FIRST
FIRST and FOLLOW –
Introduction
• The functions FIRST and FOLLOW help in the construction of both top-down and bottom-up
parsers, associated with a grammar G
• During top-down parsing, FIRST and FOLLOW allows us to choose which production to apply, based
on the next input symbol
• During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as
synchronizing tokens
Computation of FIRST
Definition
• Let  be any set of grammar symbols
• FIRST() is defined as the set of terminals that begin strings derived from 
Rules to compute FIRST(X)

Apply the following rules until no more terminals or ε can be added to any FIRST set
1. If X is a terminal, then FIRST(X)={X}
2. If X is a nonterminal and X→Y1Y2…Yk is a production for some k≥1, then
• Add everything in FIRST(Y1) to FIRST(X) except ε
• If FIRST(Y1) has ε, then add everything in FIRST(Y2) to FIRST(X) except ε
•
• If FIRST(Yk-1) has ε, then add everything in FIRST(Yk) to FIRST(X) including ε

3. If X→ ε is a production, then add ε to FIRST(X)
PROBLEMS RELATED TO
FIRST
Example 1
• Compute the function FIRST for all non terminals
FIRST(E) = FIRST(T)
FIRST(T) = FIRST(F)
FIRST(F) = { (, id } So, FIRST(E) = FIRST(T) = { (, id }
FIRST(E′) = { +,ε }
FIRST(T′) = { *,ε }
Example 2
S→(L)|a
L→L,S|S
FIRST(S) = {(, a }
FIRST(L) = FIRST(S) = { (, a }
Example 3
S→cAd
A → a A′
A′ → b | ε
FIRST(S) = { c }
FIRST(A) = { a }
FIRST(A′) = { b , ε }
Example 4
S→L=R|R
L → * R | id
R→L
FIRST(S) = FIRST(L) and FIRST(R)
FIRST(L) = { * , id )
FIRST(R) = FIRST(L) = { * , id }
Therefore FIRST(S) = { * , id }
18CSC304J
COMPILER DESIGN
UNIT 2
SESSION 7
Session
• Computation of FOLLOW
• Problems related to FOLLOW
COMPUTATION OF
FOLLOW
Definition of FOLLOW
Definition
• Let A be a nonterminal
• FOLLOW(A) is defined as the set of terminals a that can appear immediately to the right of A in
some sentential form for some  and β
• In addition, if A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A)
$ is a special “endmarker” symbol that is assumed not to be a symbol of any grammar

Rules to compute FOLLOW
Rules to compute FOLLOW
Apply the following rules until nothing can be added to any FOLLOW set
1. Place $ in FOLLOW(S), where S is the start symbol

2. If there is a production A →  B β, then everything in FIRST(β) except ε is placed in FOLLOW(B)
3. If there is a production A →  B or a production A →  B β where FIRST(β) contains ε, then

everything in FOLLOW(A) is in FOLLOW(B)
PROBLEMS RELATED TO
FOLLOW
Example 1
• Compute the function FOLLOW for all non terminals
As E is the start symbol, we place $ in FOLLOW(E)
FOLLOW(E) = { $,
Now, check for productions where there is E in the right side F → ( E )
The symbol after E is )
FIRST of ) is ) So add ) in FOLLOW of E
FOLLOW(E) = { $, ) }
Example 1 – cont..
FOLLOW(E) = { $, ) }
Now, let us find FOLLOW(E′)
Now, check for productions where there is E′ in the right side. There are two productions:
1. E → T E′ There is no symbol after E′. So, add everything in FOLLOW(E) to FOLLOW(E′)
So, FOLLOW(E′) = { $, ) }
2. E′ → + T E′ There is no symbol after E′. So, add everything in FOLLOW(E) to FOLLOW(E′)
So, there is nothing more to add. Finally, FOLLOW(E′) = { $, ) }

FOLLOW(E) = { $, ) }
FOLLOW(E′) = { $, ) }
Now, let us find FOLLOW(T)
Now, check for productions where there is T in the right side. There are two productions:
1. E → T E′ The symbol after T is E′. So, add everything in FIRST(E′) to FOLLOW(T) except ε
So, FOLLOW(T) = { +,
As there is ε in FIRST(E′), add everything in FOLLOW(E) to FOLLOW(T)
So, FOLLOW(T) = { +, $, ) }
2. E′ → + T E′ The symbol after T is E′. So, add everything in FIRST(E′) to FOLLOW(T) except ε (Already
added). As there is ε in FIRST(E′), add everything in FOLLOW(E′) to FOLLOW(T)
So, FOLLOW(T) = { +, $, ) }
FOLLOW(E) = { $, ) }
FOLLOW(E′) = { $, ) }
FOLLOW(T) = { +, $, ) }
Now, let us find FOLLOW(T′)
Now, check for productions where there is T′ in the right side. There are two productions:
1. T → F T′ There is no symbol after T′. So, add everything in FOLLOW(T) to FOLLOW(T′)
So, FOLLOW(T′) = { +, $, )
2. T′ → * F T′ There is no symbol after T′. So, add everything in FOLLOW(T′) to FOLLOW(T′)
So, there is nothing more to add. Finally, FOLLOW(T′) = = { +, $, ) }

FOLLOW(E) = { $, ) }
FOLLOW(E′) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T′) = = { +, $, ) }
Now, let us find FOLLOW(F)
Now, check for productions where there is F in the right side. There are two productions:
1. T → F T′
2. T′ → * F T′
In both the productions, the symbol after F is T′. So, add everything in FIRST(T′) to FOLLOW(F) except
ε
So, FOLLOW(F) = { *,
As there is ε in FIRST(T′), add everything in FOLLOW(T) and FOLLOW(T′) to FOLLOW(F)
So, FOLLOW(F) = { *, +, $, ) }
Example 1 – contd..
FOLLOW of all non-terminals in the grammar
FOLLOW(E) = { $, ) }
FOLLOW(E′) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T′) = = { +, $, ) }
FOLLOW(F) = { +,*, $, ) }
Example 2
S→(L)|a
L→L,S|S
• Compute the function FOLLOW for all non terminals
FOLLOW(S) = { $ , FOLLOW(L) }
={$,),,}
FOLLOW(L) = { ) , , }
Example 3
S→cAd
A → a A′
A′ → b | ε
Compute the function FOLLOW for all non terminals
FOLLOW(S) = { $ }
FOLLOW(A) = { d }
FOLLOW(A’) = { FOLLOW(A) }
={d}
Example 4
S→L=R|R
L → * R | id
R→L
Compute the function FOLLOW for all non terminals
FOLLOW(S) = { $ }
FOLLOW(L) = { = , FOLLOW(R) }
={=,$}
FOLLOW(R) = { FOLLOW(S) & FOLLOW(L) }
={$,=}
18CSC304J
COMPILER DESIGN
UNIT 2
SESSION 11
Session
• Construction of Predictive Parsing Table

• Predictive Parsers LL(1) Grammars
CONSTRUCTION OF
PREDICTIVE PARSING
TABLE
Predictive Parsers
Predictive Parsers
• Predictive Parsers are recursive-descent parsers that need no backtracking
• Predictive parsers can be constructed for a class of grammars called LL(1)
• The first L in LL(1) stands for scanning the input from left to right
• The second L stands for producing a leftmost derivation
• The 1 stands for using one input symbol of lookahead at each step to make parsing action
decisions
Steps in Constructing Predictive Parsers
1. Eliminate left recursion from the grammar
2. Left factor the grammar
3. Compute the functions FIRST and FOLLLOW for all the non-terminals in the grammar
4. Construct predictive parsing table
5. Apply predictive parsing algorithm to parse the input string

Constructing Predictive Parsing Table
Constructing Predictive Parsing Table – Example 1
Construct Predictive Parsing table for the grammar
E→E+T|T
T→T*F|F
F → ( E ) | id
Step 1 : Eliminate Left Recursion
E → T E′
E′ → + T E′ | ε
T → F T′
T′ → * F T′ | ε
F → ( E ) | id
Step 2 : Left Factor the grammar
There is no need for left factoring in this grammar as there are no common prefixes
– cont..
Now the grammar is
E → T E′
E′ → + T E′ | ε
T → F T′
T′ → * F T′ | ε
F → ( E ) | id
Step 3 : Compute FIRST and FOLLOW for all the non-terminals

– cont..
Step 4 : Construct predictive parsing table
The grammar after eliminating left recursion is
E → T E′
E′ → + T E′
E′ → ε
T → F T′
T′ → * F T′
T′ → ε
F→(E)
F → id
Example 2
Construct predictive parsing table for the grammar given below:
S→(L)|a
L→L,S|S
S→(L)|a
L → ( L ) L′ | a L′
L′ → , S L′ | ε
Now the grammar is
S→(L)|a
L → ( L ) L′ | a L′
L′ → , S L′ | ε
FIRST(S) = { ( , a } FOLLOW(S) = { $ , , , ) }
FIRST(L) = { ( , a } FOLLOW(L) = { ) }
FIRST(L′) = { , , ε } FOLLOW(L′) = { ) }
Step 4 : Construct predictive parsing table
Now the grammar is
S→(L)
S→a
L → ( L ) L′
L → a L′
L′ → , S L′ ( ) a , $
S S→(L) S→a
L′ → ε
L L → ( L ) L′ L → a L′
L′ L′ → ε L′ → , S L′
Example 3
Construct predictive parsing table for the grammar given below:
S → iEtSS′ | a
S′ → eS | ε
E→b
There is no left recursion
FIRST(S) = { i , a } FOLLOW(S) = { $ , e }
FIRST(S′) = { e , ε } FOLLOW(S′) = { $ , e }
FIRST(E) = { b } FOLLOW(E) = { t }
Example 3
The grammar is Step 4 : Construct predictive parsing table
S → iEtSS′ FIRST(S) = { i , a } FOLLOW(S) = { $ , e }
S→a FIRST(S′) = { e , ε } FOLLOW(S′) = { $ , e }
S′ → eS FIRST(E) = { b } FOLLOW(E) = { t }
S′ → ε
i t a b e $
E→b S S → iEtSS′ S→a
S′ S′ → eS S′ → ε
S′ → ε
E E→b
In the parsing table, the entry for M[S′,e] contains two productions. This is because
the grammar is ambiguous. Hence, the grammar is not LL(1)
PREDICTIVE PARSERS
LL(1) GRAMMARS
LL(1) Grammars
• The class of LL(1) grammars is rich enough to cover most programming constructs
• Care should be taken in writing suitable grammar for the source language
• No left recursive or ambiguous grammar can be LL(1)
• A grammar G is LL(1) if and only if whenever A →  | β are two distinct productions of G, the
following conditions hold:
1. For no terminal a do both α and β derive strings beginning with a
2. At most one of α and β can derive the empty string
3. If then  does not derive any string beginning with a terminal in FOLLOW(A). Likewise,
if then β does not derive any string beginning with a terminal in FOLLOW(A).
• The first two conditions are equivalent to the statement that FIRST() and FIRST(β) are disjoint sets
• The third condition is equivalent to stating that if ε is in FIRST(β), then FIRST() and FOLLOW(A) are
disjoint sets, and likewise if ε is in FIRST()
18CSC304J
COMPILER DESIGN
UNIT 2
SESSIONS 12 & 13
Session
• Transition Diagrams for Predictive Parsers

• Non Recursive Predictive Parser
• Predictive Parsing Algorithm
• Error Recovery in Predictive Parsing
TRANSITION DIAGRAMS
FOR PREDICTIVE
PARSERS
Transition Diagrams for Predictive Parsers
Transition Diagrams for Predictive Parsers – cont..
• Transition diagrams for predictive parsers differ from those for lexical analyzers
• Parsers have one diagram for each non-terminal
• The labels of edges can be tokens (terminals) or non-terminals
• A transition on a token (terminal) means that we take that transition if that token is the next input
symbol
• A transition on a non-terminal A is a call of the procedure for A
• With an LL(1) grammar, the ambiguity of whether or not to take an ε-edge can be resolved by
making ε-transitions the default choice
The predictive parser working off the transition diagrams behaves as follows:
1. It begins in the start state for the start symbol
2. If it is in state s with an edge labeled by terminal a to state t, and if the next input symbol is a,
then the parser move the input cursor one position right and goes to state t
3. If the edge is labeled by a non terminal A, the parser goes to the start state for A, without moving
the input cursor. If it reaches the final state for A, it immediately goes to state t
4. If there is an edge from s to t labeled ε, then from state s, the parser immediately goes to state t,
without advancing the input
Thus a predictive parser program based on transition diagram attempts to match terminal symbols
against the input and makes a recursive procedure call whenever it has to follow an edge labeled by
a non terminal.
Simplification of Transition Diagrams
• Transition diagrams can be simplified by substituting diagrams in one another
• Now, substitute the transition diagram of E′ on the transition diagram of E

Simplification of Transition Diagrams – cont..
• Apply the same techniques to T and T′
• Now, substitute the transition diagram of T′ on the transition diagram of T

Simplification of Transition Diagrams – cont..
The final simplified transition diagrams for the expression grammar

NON RECURSIVE
PREDICTIVE PARSER
Non Recursive Predictive Parser
• It is possible to build a non-recursive predictive parser by maintaining a stack explicitly, rather than
implicitly via recursive calls
• To determine the production to be applied for a non-terminal, the parser looks up a parsing table
Model of a table-driven predictive parser
• A table driven parser has an input buffer, a stack, a parsing

table and an output stream
• The input buffer contains the string to be parsed, followed by

$, a symbol used as a right end-marker to indicate the end of
the input string
• The stack contains a sequence of grammar symbols with $ on

the bottom, indicating the bottom of the stack
• Initially, the stack contains the start symbol of the grammar

on top of $
• The parsing table is a two-dimensional array M[A,a], where A

is a non-terminal, and a is a terminal or the symbol $
Non Recursive Predictive Parser – cont..
PREDICTIVE PARSING
ALGORITHM
Predictive
Parsing
Algorithm
Predictive Parsing – Example 1 Stack Input Output
$E id + id * id $
$ E’ T id + id * id $ E → T E′
$ E’ T’ F id + id * id $ T → F T′
$ E’ T’ id id + id * id $ F → id
$ E’ T’ + id * id $
$ E’ + id * id $ T′ → ε
$ E’ T + + id * id $ E′ → + T E′
$ E’ T id * id $
$ E’ T’ F id * id $ T → F T′
$ E’ T’ id id * id $ F → id
$ E’ T’ * id $
$ E’ T’ F * * id $ T′ → * F T′
$ E’ T’ F id $
$ E’ T’ id id $ F → id
$ E’ T’ $
$ E’ $ T′ → ε
$ $ Accept
Predictive Parsing – Example 3
Consider the grammar
S → aABe
A → Abc | b
B→d
Check whether the grammar is LL(1). If so, parse the string abbcde
(or)
Construct predictive parsing table for the grammar and parse the string abbcde
ERROR RECOVERY IN
PREDICTIVE PARSING
Error Recovery in Predictive Parsing
Panic Mode Recovery
Some heuristics for choosing the synchronizing set
1. Place all symbols in FOLLOW(A) into the synchronizing set for non-terminal A. If we skip
tokens until an element of FOLLOW(A) is seen and pop A from the stack. It is likely that
parsing can continue
2. We can add keywords that begin statements to the synchronizing sets for the non-terminals
generating expressions
3. If we add symbols in FIRST(A) to the synchronizing set for non-terminal A, then it may be
possible to resume parsing according to A if a symbol in FIRST(A) appears in the input
4. If a non-terminal can generate the empty string, then the production deriving ε can be used as
default
5. If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, issue
a message saying that the terminal was inserted, and continue parsing
Panic Mode Recovery - Example
Panic Mode Recovery – Example – cont..
• On an erroneous input ) id * + id the parser and error recovery mechanism will behave as follows:
Note:
• Panic-mode recovery does not address

the important issue of error messages
• The compiler designer must supply

informative error messages that not only
describe the error , but they must draw
attention to where the error was
discovered
Phrase Level Recovery
• Phrase-level error recovery is implemented by filling in the blank entries in the predictive parsing
table with pointers to error routines
• These routines may change, insert, or delete symbols on the input and issue appropriate error
messages
• They may also pop from the stack
• Alteration of stack symbols or the pushing of new symbols onto the stack is questionable for several
reasons:
1. The steps carried out by the parser might not correspond to the derivation of any word in the
language at all
2. We must ensure that there is no possibility of an infinite loop. To protect against such loops,
we must check that the error recovery action eventually results in an input symbol being
consumed ( or the stack is shortened if the end of the input is reached )

Unit 2 - Sessions 1 - 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2 - Sessions 1 - 2

Uploaded by

Copyright:

Available Formats

18CSC304J

• Syntax Analysis Definition

• Syntax Analysis is the second phase of the compiler design process

• It analyzes the syntactical structure

• In C, for example, a program is made up of functions, a function out of declarations and

• A grammar allows a language to be evolved or developed iteratively, by adding new constructs to

• Collecting information about various tokens into the symbol table

• Performing type checking and other kinds of semantic analysis

• Generating intermediate code

Gramma Grammar Language Accepted Automation

• A context-free grammar (grammar) consists of terminals, non-terminals, a start symbol and

2. Non-terminals are syntactic variables that denote sets of strings

expression → expression – term E→E–T T→T*F|T/F|F

expression → term E→T F → ( E ) | id

term → term * factor T→T*F

term → term / factor T→T/F

Non-terminals: Non-terminals: Non-terminals:

expression, term, factor E, T, F E, T, F

Start Symbol : expression Start Symbol : E Start Symbol : E

• Operator symbols such as +, *, and so on • The letter S, which when it appears, is

• Boldface strings such as id or if, each of which

1 2 3 are the alternatives of A

• A derivation proves that a string belongs to the language defined by a grammar

• A parse tree can be constructed with the help of a derivation

• A parse tree is a graphical representation of a derivation

Derive the string aabbbcccc

• Grammars are more powerful notation than regular expressions

• Every regular language is a context-free language, but not vice versa

Common programming errors can occur at many different levels

Another example in C is the appearance of a case statement without an enclosing switch

• Report the presence of errors clearly and accurately

• Recover from each error quickly enough to detect subsequent errors

• Add minimal overhead to the processing of correct programs

How should an error handler report the presence of an error?

Once an error is detected, how should the parser recover?

• It is guaranteed not to go into an infinite loop

• So these techniques are currently only of theoretical interest

• There exists no general algorithm to remove ambiguity from grammar

• However, it is not always compulsory

• Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity

• Here “other” stands for any other statement

• Parse tree for the expression

As there are two parse trees for the given

expression, the grammar is ambiguous

• A matched statement is either an if-then-else statement containing no open statements or it is

• Now the expression has only one parse tree

The reason for ambiguity in this grammar is

• A grammar is left recursive if it has a nonterminal A such that there is a derivation

• Top down parsing methods cannot handle left-recursive grammars

• So, a transformation is needed to eliminate left recursion

• Left recursion can be eliminated by rewriting the grammar

A → A1 | A2 | … | Am | β1 | β2 | … | βn

• To eliminate left recursion, we can rewrite the grammar as follows

After eliminating the immediate left recursion, we get

Check if there is immediate left recursion in S → Aa | b

The grammar after left factoring is

The grammar after left factoring is

The grammar after left factoring is

The grammar after left factoring is

• Top Down Parsing

• Construct parse tree for the string id + id * id

• It may require backtracking to find the correct A-production to be applied