Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Chapter-3// Syntax Analyzer

Chapter-3
Syntax Analyzer
The Role of the Parser
The parser obtains a string of tokens from the lexical analyzer, as shown in Fig. 4.1,
and verifies that the string of token names can be generated by the grammar for the
source language. We expect the parser to report any syntax errors in an intelligible
fashion and to recover from commonly occurring errors to continue processing the
remainder of the program. Conceptually, for well-formed programs, the parser
constructs a parse tree and passes it to the rest of the compiler for further processing.
In fact, the parse tree need not be constructed explicitly, since checking and
translation actions can be interspersed with parsing, as we shall see. Thus, the parser
and the rest of the front end could well be implemented by a single module.

There are three general types of parsers for grammars:


universal, top-down, and bottom-up.
Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's
algorithm can parse any grammar. These general methods are, however, too inefficient
to use in production compilers.
The methods commonly used in compilers can be classified as being either top-down
or bottom-up. As implied by their names, top-down methods build parse trees from
the top (root) to the bottom (leaves), while bottom-up methods start from the leaves
and work their way up to the root. In either case, the input to the parser is scanned
from left to right, one symbol at a time.
In this chapter, we assume that the output of the parser is some representation of the
parse tree for the stream of tokens that comes from the lexical analyzer. In practice,
there are several tasks that might be conducted during parsing, such as collecting
information about various tokens into the symbol table, performing type checking and
other kinds of semantic analysis, and generating intermediate code. We have lumped
all these activities into the rest of the front end" box in Fig. 4.1. These activities will be
covered in detail in subsequent chapters.

DR. GANGONE SWAPNA 1


Chapter-3// Syntax Analyzer

Types of Parsers:

Common programming errors can occur at many different levels.

• Lexical errors include misspellings of identifiers, keywords, or operator’s., the


use of an identifier elipseSize instead of ellipseSize | and missing quotes
around text intended as a string.
• Syntactic errors include misplaced semicolons or extra or missing braces; that
is, \{" or \}." As another example, in C or Java, the appearance of a case
statement without an enclosing switch is a syntactic error .
• Semantic errors include type mismatches between operators and operands., the
return of a value in a Java method with result type void.
• Logical errors can be anything from incorrect reasoning on the part of the
programmer to the use in a C program of the assignment operator = instead of
the comparison operator ==. The program containing = may be well formed;
however, it may not reflect the programmer's intent.
The error handler in a parser has goals that are simple to state but challenging to
realize:

• Report the presence of errors clearly and accurately.


• Recover from each error quickly enough to detect subsequent errors.
• Add minimal overhead to the processing of correct programs.
Fortunately, common errors are simple ones, and a relatively straightforward error-
handling mechanism often success. How should an error handler report the presence
of an error? At the very least, it must report the place in the source program where an
error is detected, because there is a good chance that the actual error occurred within
the previous few tokens. A common strategy is to print the offending line with a
pointer to the position at which an error is detected.

Top-Down Parser
The top-down parsing technique parses the input and starts constructing a parse tree
from the root node gradually moving down to the leaf nodes. The types of top-down
parsing are depicted below:

DR. GANGONE SWAPNA 2


Chapter-3// Syntax Analyzer

Recursive Descent Parsing


Recursive descent is a top-down parsing technique that constructs the parse tree from
the top and the input is read from left to right. It uses procedures for every terminal
and non-terminal entity. This parsing technique recursively parses the input to make
a parse tree, which may or may not require back-tracking. But the grammar
associated with it (if not left factored) cannot avoid back-tracking.
A form of recursive descent parsing that does not require any backtracking is known
as predictive parsing.

Recursive descent parsing

Backtracking is required in the next example, and we shall suggest a way of keeping
track of the input when backtracking takes place.

DR. GANGONE SWAPNA 3


Chapter-3// Syntax Analyzer

Left Recursion-
• A production of grammar is said to have left recursion if the leftmost variable of
its RHS is same as variable of its LHS.
• A grammar containing a production having left recursion is called as Left
Recursive Grammar.

Example-
S → Sa / ∈
(Left Recursive Grammar)

• Left recursion is considered to be a problematic situation for Top-down parsers.


• Therefore, left recursion has to be eliminated from the grammar.

Elimination of Left Recursion


Left recursion is eliminated by converting the grammar into a right recursive grammar.
If we have the left-recursive pair of productions-
A → Aα / β
(Left Recursive Grammar)
where β does not begin with an A.

Then, we can eliminate left recursion by replacing the pair of productions with-
A → βA’

A’ → αA’ / ∈

DR. GANGONE SWAPNA 4


Chapter-3// Syntax Analyzer

Problem-01:
Consider the following grammar and eliminate left recursion-
A → ABd / Aa / a

B → Be / b
Solution: The grammar after eliminating left recursion is-

A → aA’
A’ → BdA’ / aA’ / ∈

B → bB’

B’ → eB’ / ∈
Left Factoring-Left factoring is a process by which the grammar with common
prefixes is transformed to make it useful for Top-down parsers.

How?
In left factoring,

• We make one production for each common prefixes.


• The common prefix may be a terminal or a non-terminal or a combination of
both.
• Rest of the derivation is added by new productions.
The grammar obtained after the process of left factoring is called as Left Factored
Grammar.

Problem
Do left factoring in the following grammar-

A → aAB / aBc / aAc

Solution- Step-01:
A → aA’

A’ → AB / Bc / Ac

DR. GANGONE SWAPNA 5


Chapter-3// Syntax Analyzer

Again, this is a grammar with common prefixes.


Step-02:
A → aA’

A’ → AD / Bc
D→B/c

This is a left factored grammar.

Rules For Calculating First Function-


Rule-01: For a production rule X → ∈,

First(X) = { ∈ }

Rule-02: For any terminal symbol ‘a’,

First(a) = {a }
Rule-03: For a production rule X → Y1Y2Y3,

Calculating First(X)

• If ∈ ∉ First(Y1), then First(X) = First(Y1)

• If ∈ ∈ First(Y1), then First(X) = {First(Y1) – ∈ } ∪ First(Y2Y3)


Calculating First(Y2Y3)
• If ∈ ∉ First(Y2), then First(Y2Y3) = First(Y2)

• If ∈ ∈ First(Y2), then First(Y2Y3) = { First(Y2) – ∈ } ∪ First(Y3)

Similarly, we can make expansion for any production rule X → Y1Y2Y3…..Yn.

Rules For Calculating Follow Function-


Rule-01: For the start symbol S, place $ in Follow(S).
Rule-02: For any production rule A → αB,

Follow(B) = Follow(A)

Rule-03: For any production rule A → αBβ,

• If ∈ ∉ First(β), then Follow(B) = First(β)

DR. GANGONE SWAPNA 6


Chapter-3// Syntax Analyzer

• If ∈ ∈ First(β), then Follow(B) = { First(β) – ∈ } ∪ Follow(A)

Important Notes-
Note-01:

• ∈ may appear in the first function of a non-terminal.

• ∈ will never appear in the follow function of a non-terminal.

Note-02:
• Before calculating the first and follow functions, eliminate Left Recursion from
the grammar, if present.
Note-03:
• We calculate the follow function of a non-terminal by looking where it is present
on the RHS of a production rule.
Problem-01:
Calculate the first and follow functions for the given grammar-
S → aBDh
B → cC

C → bC / ∈

D → EF

E→g/∈

F→f/∈
Solution: The first and follow functions are as follows-
First Functions-
• First(S) = { a }

• First(B) = { c }

• First(C) = { b , ∈ }

• First(D) = { First(E) – ∈ } ∪ First(F) = { g , f , ∈ }

• First(E) = { g , ∈ }

• First(F) = { f , ∈ }

Follow Functions-

• Follow(S) = { $ }

• Follow(B) = { First(D) – ∈ } ∪ First(h) = { g , f , h }

DR. GANGONE SWAPNA 7


Chapter-3// Syntax Analyzer

• Follow(C) = Follow(B) = { g , f , h }
• Follow(D) = First(h) = { h }

• Follow(E) = { First(F) – ∈ } ∪ Follow(D) = { f , h }

• Follow(F) = Follow(D) = { h }

Predictive Parser - LL(1) Parser


Structure of LL(1) Parser

As shown the parser program works with the following 3 components to produce
output
INPUT: Contains string to be parsed with $ as it's end marker
STACK: Contains sequence of grammar symbols with $ as it's bottom marker. Initially
stack contains only $
PARSING TABLE: A two dimensional array M[A,a], where A is a non-terminal and a is
a Terminal
It is a tabular implementation of the recursive descent parsing, where a stack is
maintained by the parser rather than the language in which parser is written
Important Points:
It eliminates the need for recursion facility in the language used for implementing
parser
An overhead with this technique is explicit maintenance of stack.

DR. GANGONE SWAPNA 8


Chapter-3// Syntax Analyzer

Predictive Parser or LL (1) Parser or Non-Recursive Descent Parser:


LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done
from Left to Right manner and the second L shows that in this parsing technique we
are going to use Left most Derivation Tree. And finally, the 1 represents the number of
look-ahead, which means how many symbols are you going to see when you want to
make a decision.
Steps to construct LL(1) Parser
1.Elimination of Left Recursion
2.Left Factoring

3.Perform FIRST and FOLLOW functions


4.Construct LL(1) or Predictive Parsing table

5.Parse the input string


6.Construct Parse tree
Algorithm to construct LL(1) Parsing Table:
Step 4: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First(α), make entry A –> α in the table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each
terminal in Follow(A), make entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry
A –> ε in the table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the non-Terminals and the column will contain the
Terminal Symbols. All the Null Productions of the Grammars will go under the Follow
elements and the remaining productions will lie under the elements of the First set.
As you can see that all the null productions are put under the Follow set of that
symbol and all the remaining productions lie under the First of that symbol.
Note: Every grammar is not feasible for LL (1) Parsing table. It may be possible that
one cell may contain more than one production.
Now, let us understand with an example
Example: Refer to class notes.

DR. GANGONE SWAPNA 9

You might also like