Download as pdf or txt
Download as pdf or txt
You are on page 1of 133

18CSC304J

COMPILER DESIGN

UNIT 2
SESSIONS 1 & 2
Topics that will be covered in this
Session

• Syntax Analysis Definition


• Role of Parser
• Context Free Grammar
• Lexical versus Syntactic Analysis
• Syntax Error Handling
SYNTAX ANALYSIS
DEFINITION
Syntax Analysis

• Syntax Analysis is the second phase of the compiler design process

• It analyzes the syntactical structure

• It checks if the given input is in the correct syntax of the programming language or not

• Every programming language has rules that prescribe the syntactic structure of well-formed
programs

• In C, for example, a program is made up of functions, a function out of declarations and


statements, a statement our of expressions, and so on.

• The syntax of programming language constructs can be specified by context-free grammars or BNF
(Backus-Naur Form) notation

• Grammars offer significant benefits for both language designers and compiler writers
Benefits of Grammar
• A grammar gives a precise, yet easy-to-understand, syntactic specification of a programming
language

• From certain classes of grammars, we can construct automatically an efficient parser that
determines the syntactic structure of a source program

• The parser construction process can reveal syntactic ambiguities and trouble spots that might have
slipped through the initial design phase of a language

• The structure imparted to a language by a properly designed grammar is useful for translating
source programs into correct object code and for detecting errors

• A grammar allows a language to be evolved or developed iteratively, by adding new constructs to


perform new tasks

• These new constructs can be integrated more easily into an implementation that follows the
grammatical structure of the language
ROLE OF THE PARSER
The Role of Parser
• Input : Stream of tokens
• Output : Some representation of
the parse tree

Parsing Methods:
• Universal Parsing
• It obtains a string of tokens from the lexical analyzer • Cocke-Younger-Kasami algorithm
and verifies that the string can be generated by the and Earley’s algorithm
grammar for the source language
• Can parse any grammar
• The parser should also report syntax errors in an • Too inefficient to use in compilers
intelligible fashion
• Top-down Parsing commonly
• It should also recover from commonly occurring
errors • Bottom-up Parsing used in compilers
The Role of the Parser – cont..
Top-down Parsing

• Top-down parsers build parse trees from the top (root) to the bottom (leaves)

Bottom-up Parsing

• Bottom-up parsers start from the leaves and work up to the root

Note

• The input to the parser is always scanned from left to right, one symbol at a time

• The most efficient top-down and bottom-up methods work only for sub-classes of grammars

• LL and LR grammars are expensive enough to describe most of the syntactic constructs in modern
programming languages

• Parsers implemented by hand often use LL grammars (eg. Predictive Parsing approach)

• Parsers for the larger class of LR grammars are usually constructed using automated tools
The Role of the Parser – cont..
Tasks that may be conducted during parsing

• Collecting information about various tokens into the symbol table

• Performing type checking and other kinds of semantic analysis

• Generating intermediate code

These activities are lumped into the “Rest of the front end” box in the picture
CONTEXT FREE
GRAMMAR
Grammar & its Types
• Grammar denotes syntactical rules in languages

• Noam Chomsky gave a mathematical model for grammar. According to him there are 4 types of
grammars

Gramma Grammar Language Accepted Automation


r
Type
Type 0 Unrestricted Recursively Turing Machine
Grammar Enumerable
Language
Type 1 Context Sensitive Context Sensitive Linear Bounded
Grammar Language Automata
Type 2 Context Free Context Free Pushdown
Grammar Language Automata
Type 3 Regular Grammar Regular Language Finite Automata
(or) Regular
Expression
Context Free Grammar
• A Context-Free Grammar is used to systematically describe the syntax of programming language
constructs like expressions and statements

• A context-free grammar (grammar) consists of terminals, non-terminals, a start symbol and


productions

1. Terminals are the basic symbols from which strings are formed

2. Non-terminals are syntactic variables that denote sets of strings

3. In a grammar, one non-terminal is distinguished as the start symbol, and the set of strings it
denotes is the language generated by the grammar. Conventionally, the productions for the
start symbol are listed first

4. The productions of a grammar specify the manner in which the terminals and non-terminals
can be combined to form strings.
Context Free Grammar :
Example
expression → expression + term E→E+T E→E+T|E–T|T

expression → expression – term E→E–T T→T*F|T/F|F

expression → term E→T F → ( E ) | id

term → term * factor T→T*F

term → term / factor T→T/F


O O
term → factor T→F
R R
factor → ( expression ) F→(E)

factor → id F → id

Non-terminals: Non-terminals: Non-terminals:

expression, term, factor E, T, F E, T, F

Start Symbol : expression Start Symbol : E Start Symbol : E


Notational Conventions

1 2
These symbols are terminals: These symbols are non-terminals:

• Lowercase letters early in the alphabet, such • Uppercase letters early in the alphabet,
as a, b, c such as A, B, C

• Operator symbols such as +, *, and so on • The letter S, which when it appears, is


usually the start symbol
• Punctuation symbols such as parentheses,
comma, and so on • Lowercase, italic names such as expr or
stmt
• The digits 0, 1, 2, … , 9

• Boldface strings such as id or if, each of which


represents a single terminal symbol
Notational Conventions – cont.. 5
Lowercase Greek letters, , β, γ represent
3
strings of grammar symbols. A generic
Uppercase letters late in the alphabet, such as production can be written as A → 
X, Y, Z, represent grammar symbols, that is,
either non-terminals or terminals
6
4 A set of productions A → 1, A → 2, A → 3
with a common head A (call them A-
Lowercase letters late in the alphabet, such as
productions) may be written as
u, v, … z represent strings of terminals
A → 1 | 2 | 3

1 2 3 are the alternatives of A


7
Unless stated otherwise, the head of the first
production is the start symbol
Derivations from a Grammar

• A derivation of a string from a grammar is applying a sequence of productions that transform the
start symbol into the string

• A derivation proves that a string belongs to the language defined by a grammar

• A parse tree can be constructed with the help of a derivation

• A parse tree is a graphical representation of a derivation

• If at each step in a derivation, a production is applied to the left most non-terminal, then the
derivation is called the leftmost derivation

• A derivation in which the rightmost non-terminal is replaced at each step is called the rightmost
derivation
Derivations – Example 1
Consider the grammar
S → ABC
A → aA | a
B → bB | b
C → cC | c

Derive the string aabbbcccc


Derivations – Example 2
Derivations – cont..
Ambiguous Grammar

• Every parse tree has associated with it a unique leftmost and rightmost derivation

• A grammar that produces more than one parse tree for some sentence is said to be ambiguous

• Put another way, an ambiguous grammar is one that produces more than one leftmost derivation
or more than one rightmost derivation for the same sentence
Consider the grammar E → E + E | E * E | ( E ) | id
Check whether the grammar is ambiguous or not
Let us consider the string id + id * id

As we are able to draw two parse trees for the given string, the grammar is ambiguous
LEXICAL vs SYNTAX
ANALYSIS
Context Free Grammars Vs Regular
Expressions

• Grammars are more powerful notation than regular expressions

• Every construct that can be described by a regular expression can be described by a grammar, but
not vice-versa

• Every regular language is a context-free language, but not vice versa


Lexical Vs Syntax Analysis
Everything that can be described by a regular expression can also be described by a grammar. We
may therefore ask “Why use regular expressions to define the lexical syntax of a language?”
There are several reasons
• Separating the syntactic structure of a language into lexical and non-lexical parts provides a
convenient way of modularizing the front end of a compiler into two manageable-sized
components
• The lexical rules of a language are frequently quite simple, and to describe them we do not need a
notation as powerful as grammars
• Regular expressions generally provide a more concise and easier-to-understand notation for
tokens than grammars
• More efficient lexical analyzers can be constructed automatically from regular expressions than
from arbitrary grammars
• Regular expressions are most useful for describing the structure of constructs such as identifiers,
constants, keywords, and white space. Grammars, on the other hand, are most useful for
describing nested structures such as balanced parentheses, matching begin-end’s, corresponding
if-then-else’s, and so on. These nested structures cannot be described by regular expressions
Lexical Analysis
Vs
Syntax Analysis
SYNTAX ERROR
HANDLING
Syntax Error Handling

• If a compiler had to process only correct programs, its design and implementation would be
simplified greatly

• However, a compiler is expected to assist the programmer in locating and tracking down errors that
inevitably creep into programs, despite the programmer’s best efforts

• Most programming language specifications do not describe how a compiler should respond to
errors; error handling is left to the compiler designer

• Planning the error handling right from the start can both simplify the structure of a compiler and
improve its handling of errors
Common Programming Errors

Common programming errors can occur at many different levels

• Lexical errors include misspellings of identifiers, keywords, or operators, and missing quotes around
text intended as a string

• Syntactic errors include misplaced semicolons or extra or missing braces, that is, { or }

Another example in C is the appearance of a case statement without an enclosing switch

• Semantic errors include type mismatches between operators and operands, e.g., the return of a
value from a function in C with return type void

• Logical errors can be anything from incorrect reasoning on the part of the programmer to the use
in a C program of the assignment operator = instead of the comparison operator ==. The program
containing = may be well formed; however, it may not reflect the programmer’s intent
Error Recovery during Parsing / Syntax
Analysis

• The precision of parsing methods allows syntactic errors to be detected very efficiently

• Several parsing methods, such as the LL and LR methods, detect an error as soon as possible; that
is, when the stream of tokens from the lexical analyzer cannot be parsed further according to the
grammar for the language.

• They have the viable-prefix property, meaning that they detect that an error has occurred as soon
as they see a prefix of the input that cannot be completed to form a string in the language

• Error recovery is emphasized during parsing because many errors appear syntactic and are exposed
when parsing cannot continue. A few semantic errors such as type mismatches, can also be
detected efficiently; however, accurate detection of semantic and logical errors at compile time is in
general a difficult task
Goals of an Error Handler

The goals of an error handler in a parser is simple to state, but challenging to realize:

• Report the presence of errors clearly and accurately

• Recover from each error quickly enough to detect subsequent errors

• Add minimal overhead to the processing of correct programs

How should an error handler report the presence of an error?

• It must report the place in the source program where an error is detected, because there is a good
chance that the actual error occurred within the previous few tokens

• A common strategy is to print the offending line with a pointer to the position at which an error is
detected
Error Recovery Strategies

Once an error is detected, how should the parser recover?

• There is no universally acceptable strategy, but a few methods have broad applicability

• The simplest approach is for the parser to quit with an informative error message when it detects
the first error

• Additional errors are often uncovered if the parser can restore itself to a state where processing of
the input can continue with reasonable hopes that further processing will provide meaningful
diagnostic information

• If errors pile up, it is better for the compiler to give up after exceeding some error limit than to
produce an annoying avalanche of “spurious” errors
Error Recovery Strategies –
cont..
Recovery strategies

• Panic-Mode Recovery

• Phrase-Level Recovery

• Error Productions

• Global Correction
Error Recovery Strategies –
cont..
Panic-Mode Recovery

• On discovering an error, the parser discards input symbols one at a time until one of a designated
set of synchronizing tokens is found

• The synchronizing tokens are usually delimiters, such as semicolon or }, whose role in the source
program is clear and unambiguous

• The compiler designer must select the synchronizing tokens appropriate for the source language

• Panic-mode recovery often skips a considerable amount of input without checking it for additional
errors.

Advantages:

• Simplicity

• It is guaranteed not to go into an infinite loop


Error Recovery Strategies –
cont..
Phrase-Level Recovery
• On discovering an error, the parser may perform local correction on the remaining input
• It may replace a prefix of the remaining input by some string that allows the parser to continue
• Examples for local correction
• Replace a comma by a semicolon
• Delete an extraneous semicolon
• Insert a missing semicolon
• The choice of the local correction is left to the compiler designer
• We must be careful to choose replacements that do not lead to infinite loops
• It is used in several error-repairing compilers, as it can correct any input string
Drawback
• Difficulty in coping with situations in which the actual error has occurred before the point of
detection
Error Recovery Strategies –
cont..
Error Productions

• By anticipating common errors that might be encountered, we can augment the grammar for the
language at hand with productions that generate the erroneous constructs

• A parser constructed from a grammar augmented by these error productions detects the
anticipated errors when an error production is used during parsing

• The parser can generate appropriate error diagnostics about the erroneous construct that has been
recognized in the input
Error Recovery Strategies –
cont..
Global Correction

• Ideally, we would like a compiler to make as few changes as possible in processing an incorrect
input string

• There are algorithms for choosing a minimal sequence of changes to obtain a globally least-cost
correction

• Given an incorrect input string x and grammar G, these algorithms will find a parse tree for a
related string y, such that the number of insertions, deletions, and changes of tokens required to
transform x into y is as small as possible

• Unfortunately, these methods are in general too costly to implement in terms of time and space

• So these techniques are currently only of theoretical interest

• Note: A closest correct program may not be what the programmer had in mind
18CSC304J
COMPILER DESIGN

UNIT 2
SESSION 3
Topics that will be covered in this
Session

• Elimination of Ambiguity
• Elimination of Left Recursion
• Left Factoring
ELIMINATION OF
AMBIGUITY
Eliminating Ambiguity

• There exists no general algorithm to remove ambiguity from grammar

• To check a grammar for ambiguity, we try finding a string that has more than one parse tree. If
any such string exists, then the grammar is ambiguous

• Causes such as left recursion, common prefixes etc. makes the grammar ambiguous

• The removal of these causes may convert the grammar into unambiguous grammar

• However, it is not always compulsory

• Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity


Eliminating Ambiguity –
Example 1
Eliminate ambiguity from the “dangling-else” grammar

• Here “other” stands for any other statement

• Parse tree for the expression


Eliminating Ambiguity – Example 1
– cont…
• Consider the expression

Parse Tree
1

Parse Tree
2

As there are two parse trees for the given

expression, the grammar is ambiguous


Eliminating Ambiguity – Example 1 –
cont..
• We can rewrite the dangling-else grammar as the following unambiguous grammar

• The idea is that a statement appearing between a then and an else must be matched

• That is, an interior statement must not end with an unmatched or open then

• A matched statement is either an if-then-else statement containing no open statements or it is


any other kind of unconditional statement
Eliminating Ambiguity – Example 1 –
cont..

• Now the expression has only one parse tree


Eliminating Ambiguity –
Example 2
Eliminate ambiguity from the “expression” grammar

E → E + E | E * E | ( E ) | id

The reason for ambiguity in this grammar is

• The precedence and associative rules are not imposed in the grammar

We can rewrite the grammar by imposing precedence and associative rules as follows

E→E+T|T

T→T*F|F

F → ( E ) | id
ELIMINATION OF LEFT
RECURSION
Eliminating Left Recursion

• A grammar is left recursive if it has a nonterminal A such that there is a derivation

• A production in which the leftmost symbol on the right side is the same as the nonterminal on the
left side of the production is called a left-recursive production

Eg : E → E + T

• Top down parsing methods cannot handle left-recursive grammars

• So, a transformation is needed to eliminate left recursion

• Left recursion can be eliminated by rewriting the grammar


Rule for Eliminating Immediate Left
Recursion
• Suppose we have the following productions:

A → A1 | A2 | … | Am | β1 | β2 | … | βn

• To eliminate left recursion, we can rewrite the grammar as follows

A → β1 A′| β2 A′| … | βn A′
A′ → 1 A′ | 2 A′ | …. | m A′ | ε

• This procedure eliminates all immediate left recursion from the A and A′ productions, but it does
not eliminate left recursion involving derivations of 2 or more steps
Eliminating Immediate Left Recursion –
Example
• Eliminate left recursion from the given grammar

E→E+T|T

T→T*F|F

F → ( E ) | id

After eliminating the immediate left recursion, we get

E → T E′

E′ → + T E′ | ε

T → F T′

T′ → * F T′ | ε

F → ( E ) | id
Eliminating Left Recursion Involving
Derivations
Eliminating Left Recursion Involving Derivations –
Example 1
Eliminate left recursion from the given i=2
grammar
Substitute S productions in A
S → Aa | b
A → Ac | Aad | bd | ε
A → Ac | Sd | ε
Eliminate immediate left recursion in A
Step 1 : Order the non-terminals
A → bdA′ | A′
1–S
A′ → cA′ | adA′ | ε
2–A
The grammar after eliminating left recursion
i=1 is

Check if there is immediate left recursion in S → Aa | b


S. If so eliminate it
A → bdA′ | A′
There is no immediate left recursion in S
A′ → cA′ | adA′ | ε
Eliminating Left Recursion Involving Derivations –
Example 2
Eliminate left recursion from the given i=2
grammar Substitute S productions in L (only where it
S→(L)|a starts with S)

L→L,S|S L→L,S|(L)|a
Eliminate immediate left recursion in L
Step 1 : Order the non-terminals
L → ( L ) L′ | a L′
1–S
L′ → , S L′ | ε
2–L
The grammar after eliminating left recursion
i=1 is
Check if there is immediate left recursion in S→(L)|a
S. If so eliminate it
L → ( L ) L′ | a L′
There is no immediate left recursion in S
L′ → , S L′ | ε
LEFT FACTORING
Left Factoring
• Left Factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing

• Left Factoring will be done when more than one production of a non-terminal has the same prefix
(common prefix)

• The basic idea is that when it is not clear which of the two alternative productions to use to expand
a non terminal A, we may be able to rewrite the A-productions to defer the decision until we have
seen enough of the input to make the right choice

• For example, in the above grammar, on seeing the input if, we cannot immediately tell which
production to choose to expand stmt
Left Factoring - Algorithm
Left Factoring – Examples
Example 1 Example 2

Left factor the grammar given below: Left factor the grammar given below:

S →i E t S e S | i E t S | a A→aAB|aBc|aAc

E→b

The grammar after left factoring is

The grammar after left factoring is

S → i E t S S′ | a A → a A′

S′ → e S | ε A′ → A B | B c | A c

E→b
Left Factoring – Examples –
cont..
Example 3 Example 4

Left factor the grammar given below: Left factor the grammar given below:

S→cAd S→aSa|aSb|a|b

A→ab|a

The grammar after left factoring is

The grammar after left factoring is

S→cAd S → a S′ | b

A → a A′ S′ → S a | S b | ε

A′ → b | ε
18CSC304J
COMPILER DESIGN

UNIT 2
SESSION 6
Topics that will be covered in this
Session

• Top Down Parsing


• Recursive Descent Parsing
• Backtracking
TOP DOWN PARSING
Top Down Parsing
• Top-down parsing can be viewed as the problem of constructing a parse tree for the input string,
starting from the root and creating the nodes of the parse tree in preorder (depth-first)

• Top-down parsing can also be viewed as finding a leftmost derivation for an input string

• At each step of a top-down parsing, the key problem is that of determining the production to be
applied for a nonterminal, say A.

• Once an A-production is chosen, the rest of the parsing process consists of matching the terminal
symbols in the production body with the input string
Top Down Parsing - Example
• Consider the grammar

• Construct parse tree for the string id + id * id


Top Down
Parsing

Example
id + id * id
Top Down Parsing – Cont..
• Recursive-descent parsing is a general form of top-down parsing

• It may require backtracking to find the correct A-production to be applied

• Predictive parsing is a special case of recursive-descent parsing, where no backtracking is required

• Predictive parsing chooses the correct A-production by looking ahead at the input a fixed number
of symbols

• Typically, we may look only at one (that is, the next input symbol)

• In the previous example, at the first E′ node, the production E′ → + T E′, at the second E′ node, the
production E′ → ε is chosen

• A predictive parser can choose between E′ productions by looking at the next input symbol

• The class of grammars for which we can construct predictive parsers looking k symbols ahead in
the input is sometimes called the LL(k) grammar
RECURSIVE DESCENT
PARSING &
BACKTRACKING
Recursive-Descent Parsing
• Recursive-descent parsing is a general form of top-down parsing, that may involve backtracking,
i.e., making repeated scans of the input

• When we cannot choose a unique A-production, we must try each of several productions in some
order

• Only if there are no more A-productions to try, we declare that an input error has been found

• However, backtracking is rarely needed to parse programming language constructs, so backtracking


is rarely needed to parse programming language constructs

• So backtracking parsers are not seen frequently

NOTE:

• A left-recursive grammar can cause a recursive-descent parser, even one with backtracking, to go
into an infinite loop. That is, when we try to expand a nonterminal A, we may eventually find
ourselves again trying to expand A without having consumed any input
Recursive-Descent Parsing -
Example
• Consider the grammar

• To construct a parse tree top-down for the input string w = cad, begin with a tree consisting of a
single node labeled S, and the input pointer pointing to c, the first symbol of w

• S has only one production, so we use it to expand S and obtain the tree

• The leftmost leaf, labeled c, matches the first symbol of input w

• So we advance the input pointer to a, the second symbol of w

• Consider the next leaf labeled A. Now we expand A using the first alternative A → a b
Recursive-Descent Parsing -
Example

• We have a match for the second input symbol, a, so we advance the input pointer to d, the third
input symbol, and compare d against the next leaf, labeled b

• Since b does not match d, we report failure and go back to A to see whether there is another
alternative for A that has not been tried, but that might produce a match

• Go back to A and reset the input pointer to pos 2. The 2nd alternative for A produces the tree
• The leaf a matches the second symbol of w and the leaf d matches the third
symbol. Since we have produced a parse tree for w, we halt and announce
successful completion of parsing
Recursive-Descent Parsing –
Example 2
• Consider the grammar

S→(L)|a

L → ( L ) L′ | a L′

L′ → , S L′ | ε

• Show how recursive-descent parsing will work for the string ( a , a )


18CSC304J
COMPILER DESIGN

UNIT 2
SESSION 7
Topics that will be covered in this
Session

• Computation of FIRST
• Problems related to FIRST
COMPUTATION OF FIRST
FIRST and FOLLOW –
Introduction
• The functions FIRST and FOLLOW help in the construction of both top-down and bottom-up
parsers, associated with a grammar G

• During top-down parsing, FIRST and FOLLOW allows us to choose which production to apply, based
on the next input symbol

• During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as
synchronizing tokens
Computation of FIRST
Definition
• Let  be any set of grammar symbols
• FIRST() is defined as the set of terminals that begin strings derived from 

Rules to compute FIRST(X)


Apply the following rules until no more terminals or ε can be added to any FIRST set
1. If X is a terminal, then FIRST(X)={X}
2. If X is a nonterminal and X→Y1Y2…Yk is a production for some k≥1, then
• Add everything in FIRST(Y1) to FIRST(X) except ε
• If FIRST(Y1) has ε, then add everything in FIRST(Y2) to FIRST(X) except ε

• If FIRST(Yk-1) has ε, then add everything in FIRST(Yk) to FIRST(X) including ε


3. If X→ ε is a production, then add ε to FIRST(X)
PROBLEMS RELATED TO
FIRST
Example 1
• Consider the grammar

• Compute the function FIRST for all non terminals

FIRST(E) = FIRST(T)

FIRST(T) = FIRST(F)

FIRST(F) = { (, id } So, FIRST(E) = FIRST(T) = { (, id }

FIRST(E′) = { +,ε }

FIRST(T′) = { *,ε }
Example 2
• Consider the grammar

S→(L)|a

L→L,S|S

• Compute the function FIRST for all non terminals

FIRST(S) = {(, a }

FIRST(L) = FIRST(S) = { (, a }
Example 3
• Consider the grammar

S→cAd

A → a A′

A′ → b | ε

• Compute the function FIRST for all non terminals

FIRST(S) = { c }

FIRST(A) = { a }

FIRST(A′) = { b , ε }
Example 4
• Consider the grammar

S→L=R|R

L → * R | id

R→L

• Compute the function FIRST for all non terminals

FIRST(S) = FIRST(L) and FIRST(R)

FIRST(L) = { * , id )

FIRST(R) = FIRST(L) = { * , id }

Therefore FIRST(S) = { * , id }
18CSC304J
COMPILER DESIGN

UNIT 2
SESSION 7
Topics that will be covered in this
Session

• Computation of FOLLOW
• Problems related to FOLLOW
COMPUTATION OF
FOLLOW
Definition of FOLLOW

Definition

• Let A be a nonterminal

• FOLLOW(A) is defined as the set of terminals a that can appear immediately to the right of A in
some sentential form for some  and β

• In addition, if A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A)

$ is a special “endmarker” symbol that is assumed not to be a symbol of any grammar


Rules to compute FOLLOW

Rules to compute FOLLOW

Apply the following rules until nothing can be added to any FOLLOW set

1. Place $ in FOLLOW(S), where S is the start symbol


2. If there is a production A →  B β, then everything in FIRST(β) except ε is placed in FOLLOW(B)

3. If there is a production A →  B or a production A →  B β where FIRST(β) contains ε, then


everything in FOLLOW(A) is in FOLLOW(B)
PROBLEMS RELATED TO
FOLLOW
Example 1
• Consider the grammar

• Compute the function FOLLOW for all non terminals

As E is the start symbol, we place $ in FOLLOW(E)

FOLLOW(E) = { $,

Now, check for productions where there is E in the right side F → ( E )

The symbol after E is )

FIRST of ) is ) So add ) in FOLLOW of E

FOLLOW(E) = { $, ) }
Example 1 – cont..

FOLLOW(E) = { $, ) }

Now, let us find FOLLOW(E′)

Now, check for productions where there is E′ in the right side. There are two productions:

1. E → T E′ There is no symbol after E′. So, add everything in FOLLOW(E) to FOLLOW(E′)

So, FOLLOW(E′) = { $, ) }

2. E′ → + T E′ There is no symbol after E′. So, add everything in FOLLOW(E) to FOLLOW(E′)

So, there is nothing more to add. Finally, FOLLOW(E′) = { $, ) }


Example 1 – cont..

FOLLOW(E) = { $, ) }
FOLLOW(E′) = { $, ) }
Now, let us find FOLLOW(T)
Now, check for productions where there is T in the right side. There are two productions:
1. E → T E′ The symbol after T is E′. So, add everything in FIRST(E′) to FOLLOW(T) except ε
So, FOLLOW(T) = { +,
As there is ε in FIRST(E′), add everything in FOLLOW(E) to FOLLOW(T)
So, FOLLOW(T) = { +, $, ) }
2. E′ → + T E′ The symbol after T is E′. So, add everything in FIRST(E′) to FOLLOW(T) except ε (Already
added). As there is ε in FIRST(E′), add everything in FOLLOW(E′) to FOLLOW(T)
So, FOLLOW(T) = { +, $, ) }
Example 1 – cont..

FOLLOW(E) = { $, ) }

FOLLOW(E′) = { $, ) }

FOLLOW(T) = { +, $, ) }

Now, let us find FOLLOW(T′)

Now, check for productions where there is T′ in the right side. There are two productions:

1. T → F T′ There is no symbol after T′. So, add everything in FOLLOW(T) to FOLLOW(T′)

So, FOLLOW(T′) = { +, $, )

2. T′ → * F T′ There is no symbol after T′. So, add everything in FOLLOW(T′) to FOLLOW(T′)

So, there is nothing more to add. Finally, FOLLOW(T′) = = { +, $, ) }


Example 1 – cont..
FOLLOW(E) = { $, ) }
FOLLOW(E′) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T′) = = { +, $, ) }
Now, let us find FOLLOW(F)
Now, check for productions where there is F in the right side. There are two productions:
1. T → F T′
2. T′ → * F T′
In both the productions, the symbol after F is T′. So, add everything in FIRST(T′) to FOLLOW(F) except
ε
So, FOLLOW(F) = { *,
As there is ε in FIRST(T′), add everything in FOLLOW(T) and FOLLOW(T′) to FOLLOW(F)
So, FOLLOW(F) = { *, +, $, ) }
Example 1 – contd..
FOLLOW of all non-terminals in the grammar

FOLLOW(E) = { $, ) }

FOLLOW(E′) = { $, ) }

FOLLOW(T) = { +, $, ) }

FOLLOW(T′) = = { +, $, ) }

FOLLOW(F) = { +,*, $, ) }
Example 2
• Consider the grammar

S→(L)|a

L→L,S|S

• Compute the function FOLLOW for all non terminals

FOLLOW(S) = { $ , FOLLOW(L) }

={$,),,}

FOLLOW(L) = { ) , , }
Example 3
• Consider the grammar

S→cAd

A → a A′

A′ → b | ε

Compute the function FOLLOW for all non terminals

FOLLOW(S) = { $ }

FOLLOW(A) = { d }

FOLLOW(A’) = { FOLLOW(A) }

={d}
Example 4
• Consider the grammar

S→L=R|R

L → * R | id

R→L

Compute the function FOLLOW for all non terminals

FOLLOW(S) = { $ }

FOLLOW(L) = { = , FOLLOW(R) }

={=,$}

FOLLOW(R) = { FOLLOW(S) & FOLLOW(L) }

={$,=}
18CSC304J
COMPILER DESIGN

UNIT 2
SESSION 11
Topics that will be covered in this
Session

• Construction of Predictive Parsing Table


• Predictive Parsers LL(1) Grammars
CONSTRUCTION OF
PREDICTIVE PARSING
TABLE
Predictive Parsers

Predictive Parsers

• Predictive Parsers are recursive-descent parsers that need no backtracking

• Predictive parsers can be constructed for a class of grammars called LL(1)

• The first L in LL(1) stands for scanning the input from left to right

• The second L stands for producing a leftmost derivation

• The 1 stands for using one input symbol of lookahead at each step to make parsing action
decisions
Steps in Constructing Predictive Parsers

1. Eliminate left recursion from the grammar

2. Left factor the grammar

3. Compute the functions FIRST and FOLLLOW for all the non-terminals in the grammar

4. Construct predictive parsing table

5. Apply predictive parsing algorithm to parse the input string


Constructing Predictive Parsing Table
Constructing Predictive Parsing Table – Example 1
Construct Predictive Parsing table for the grammar

E→E+T|T

T→T*F|F

F → ( E ) | id

Step 1 : Eliminate Left Recursion

E → T E′

E′ → + T E′ | ε

T → F T′

T′ → * F T′ | ε

F → ( E ) | id

Step 2 : Left Factor the grammar

There is no need for left factoring in this grammar as there are no common prefixes
Constructing Predictive Parsing Table – Example 1
– cont..
Now the grammar is

E → T E′

E′ → + T E′ | ε

T → F T′

T′ → * F T′ | ε

F → ( E ) | id

Step 3 : Compute FIRST and FOLLOW for all the non-terminals


Constructing Predictive Parsing Table – Example 1
– cont..
Step 4 : Construct predictive parsing table

The grammar after eliminating left recursion is

E → T E′

E′ → + T E′

E′ → ε

T → F T′

T′ → * F T′

T′ → ε

F→(E)

F → id
Example 2
Construct predictive parsing table for the grammar given below:

S→(L)|a

L→L,S|S

Step 1 : Eliminate Left Recursion

S→(L)|a

L → ( L ) L′ | a L′

L′ → , S L′ | ε

Step 2 : Left Factor the grammar

There is no need for left factoring in this grammar as there are no common prefixes
Example 2 – cont..
Now the grammar is

S→(L)|a

L → ( L ) L′ | a L′

L′ → , S L′ | ε

Step 3 : Compute FIRST and FOLLOW for all the non-terminals

FIRST(S) = { ( , a } FOLLOW(S) = { $ , , , ) }

FIRST(L) = { ( , a } FOLLOW(L) = { ) }

FIRST(L′) = { , , ε } FOLLOW(L′) = { ) }
Example 2 – cont..
Step 4 : Construct predictive parsing table

Now the grammar is

S→(L)

S→a

L → ( L ) L′

L → a L′

L′ → , S L′ ( ) a , $
S S→(L) S→a
L′ → ε
L L → ( L ) L′ L → a L′
L′ L′ → ε L′ → , S L′
Example 3
Construct predictive parsing table for the grammar given below:
S → iEtSS′ | a
S′ → eS | ε
E→b
Step 1 : Eliminate Left Recursion
There is no left recursion
Step 2 : Left Factor the grammar
There is no need for left factoring in this grammar as there are no common prefixes
Step 3 : Compute FIRST and FOLLOW for all the non-terminals
FIRST(S) = { i , a } FOLLOW(S) = { $ , e }
FIRST(S′) = { e , ε } FOLLOW(S′) = { $ , e }
FIRST(E) = { b } FOLLOW(E) = { t }
Example 3
The grammar is Step 4 : Construct predictive parsing table

S → iEtSS′ FIRST(S) = { i , a } FOLLOW(S) = { $ , e }

S→a FIRST(S′) = { e , ε } FOLLOW(S′) = { $ , e }

S′ → eS FIRST(E) = { b } FOLLOW(E) = { t }

S′ → ε
i t a b e $
E→b S S → iEtSS′ S→a
S′ S′ → eS S′ → ε
S′ → ε
E E→b

In the parsing table, the entry for M[S′,e] contains two productions. This is because
the grammar is ambiguous. Hence, the grammar is not LL(1)
PREDICTIVE PARSERS
LL(1) GRAMMARS
LL(1) Grammars
• The class of LL(1) grammars is rich enough to cover most programming constructs
• Care should be taken in writing suitable grammar for the source language
• No left recursive or ambiguous grammar can be LL(1)
• A grammar G is LL(1) if and only if whenever A →  | β are two distinct productions of G, the
following conditions hold:
1. For no terminal a do both α and β derive strings beginning with a
2. At most one of α and β can derive the empty string
3. If then  does not derive any string beginning with a terminal in FOLLOW(A). Likewise,
if then β does not derive any string beginning with a terminal in FOLLOW(A).
• The first two conditions are equivalent to the statement that FIRST() and FIRST(β) are disjoint sets
• The third condition is equivalent to stating that if ε is in FIRST(β), then FIRST() and FOLLOW(A) are
disjoint sets, and likewise if ε is in FIRST()
18CSC304J
COMPILER DESIGN

UNIT 2
SESSIONS 12 & 13
Topics that will be covered in this
Session

• Transition Diagrams for Predictive Parsers


• Non Recursive Predictive Parser
• Predictive Parsing Algorithm
• Error Recovery in Predictive Parsing
TRANSITION DIAGRAMS
FOR PREDICTIVE
PARSERS
Transition Diagrams for Predictive Parsers
Transition Diagrams for Predictive Parsers – cont..

• Transition diagrams for predictive parsers differ from those for lexical analyzers

• Parsers have one diagram for each non-terminal

• The labels of edges can be tokens (terminals) or non-terminals

• A transition on a token (terminal) means that we take that transition if that token is the next input
symbol

• A transition on a non-terminal A is a call of the procedure for A

• With an LL(1) grammar, the ambiguity of whether or not to take an ε-edge can be resolved by
making ε-transitions the default choice
Transition Diagrams for Predictive Parsers – cont..
The predictive parser working off the transition diagrams behaves as follows:

1. It begins in the start state for the start symbol

2. If it is in state s with an edge labeled by terminal a to state t, and if the next input symbol is a,
then the parser move the input cursor one position right and goes to state t

3. If the edge is labeled by a non terminal A, the parser goes to the start state for A, without moving
the input cursor. If it reaches the final state for A, it immediately goes to state t

4. If there is an edge from s to t labeled ε, then from state s, the parser immediately goes to state t,
without advancing the input

Thus a predictive parser program based on transition diagram attempts to match terminal symbols
against the input and makes a recursive procedure call whenever it has to follow an edge labeled by
a non terminal.
Transition Diagrams for Predictive Parsers – cont..
Simplification of Transition Diagrams
• Transition diagrams can be simplified by substituting diagrams in one another

• Now, substitute the transition diagram of E′ on the transition diagram of E


Simplification of Transition Diagrams – cont..
• Apply the same techniques to T and T′

• Now, substitute the transition diagram of T′ on the transition diagram of T


Simplification of Transition Diagrams – cont..

The final simplified transition diagrams for the expression grammar


NON RECURSIVE
PREDICTIVE PARSER
Non Recursive Predictive Parser
• It is possible to build a non-recursive predictive parser by maintaining a stack explicitly, rather than
implicitly via recursive calls

• To determine the production to be applied for a non-terminal, the parser looks up a parsing table

Model of a table-driven predictive parser

• A table driven parser has an input buffer, a stack, a parsing


table and an output stream

• The input buffer contains the string to be parsed, followed by


$, a symbol used as a right end-marker to indicate the end of
the input string

• The stack contains a sequence of grammar symbols with $ on


the bottom, indicating the bottom of the stack

• Initially, the stack contains the start symbol of the grammar


on top of $

• The parsing table is a two-dimensional array M[A,a], where A


is a non-terminal, and a is a terminal or the symbol $
Non Recursive Predictive Parser – cont..
PREDICTIVE PARSING
ALGORITHM
Predictive
Parsing
Algorithm
Predictive Parsing – Example 1 Stack Input Output
$E id + id * id $
$ E’ T id + id * id $ E → T E′
$ E’ T’ F id + id * id $ T → F T′
$ E’ T’ id id + id * id $ F → id
$ E’ T’ + id * id $
$ E’ + id * id $ T′ → ε
$ E’ T + + id * id $ E′ → + T E′
$ E’ T id * id $
$ E’ T’ F id * id $ T → F T′
$ E’ T’ id id * id $ F → id
$ E’ T’ * id $
$ E’ T’ F * * id $ T′ → * F T′
$ E’ T’ F id $
$ E’ T’ id id $ F → id
$ E’ T’ $
$ E’ $ T′ → ε
$ $ Accept
Predictive Parsing – Example 3
Consider the grammar

S → aABe

A → Abc | b

B→d

Check whether the grammar is LL(1). If so, parse the string abbcde

(or)

Construct predictive parsing table for the grammar and parse the string abbcde
ERROR RECOVERY IN
PREDICTIVE PARSING
Error Recovery in Predictive Parsing
Panic Mode Recovery
Some heuristics for choosing the synchronizing set

1. Place all symbols in FOLLOW(A) into the synchronizing set for non-terminal A. If we skip
tokens until an element of FOLLOW(A) is seen and pop A from the stack. It is likely that
parsing can continue

2. We can add keywords that begin statements to the synchronizing sets for the non-terminals
generating expressions

3. If we add symbols in FIRST(A) to the synchronizing set for non-terminal A, then it may be
possible to resume parsing according to A if a symbol in FIRST(A) appears in the input

4. If a non-terminal can generate the empty string, then the production deriving ε can be used as
default

5. If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, issue
a message saying that the terminal was inserted, and continue parsing
Panic Mode Recovery - Example
Panic Mode Recovery – Example – cont..
• On an erroneous input ) id * + id the parser and error recovery mechanism will behave as follows:

Note:

• Panic-mode recovery does not address


the important issue of error messages

• The compiler designer must supply


informative error messages that not only
describe the error , but they must draw
attention to where the error was
discovered
Phrase Level Recovery
• Phrase-level error recovery is implemented by filling in the blank entries in the predictive parsing
table with pointers to error routines

• These routines may change, insert, or delete symbols on the input and issue appropriate error
messages

• They may also pop from the stack

• Alteration of stack symbols or the pushing of new symbols onto the stack is questionable for several
reasons:

1. The steps carried out by the parser might not correspond to the derivation of any word in the
language at all

2. We must ensure that there is no possibility of an infinite loop. To protect against such loops,
we must check that the error recovery action eventually results in an input symbol being
consumed ( or the stack is shortened if the end of the input is reached )

You might also like