Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

www.gradeup.

co

1
www.gradeup.co

COMPILER DESIGN

1 BASICS AND GRAMMAR

1. GRAMMARS

Grammars are used to describe the syntax of a programming language. It specifies the
structure of expression and statements.
A grammar has 4-Tuples:
G = (V , T , P , S)
V: Finite non-empty set of non-terminal symbols
T: Finite set of terminal symbols
P: Finite non-empty set of production rules
S: Start symbol
A grammar is composed of two elements:
i. Terminals: are symbols from which strings are formed. It is denoted by small case letters
i.e. a, b, c etc.
• Operators i.e.,+,-,*
• Punctuation symbols i.e., comma, parenthesis.
• Digits i.e. 0, 1, 2, · · · ,9.
• Boldface letters i.e., id, if.
ii. Non-terminals: Non-Terminal symbols are those which take part in the generation of the
sentence but are not part of it. It is denoted by uppercase letters i.e., A, B, C.
Production: It is of the form LHS → RHS, where LHS contains only one non-terminal and RHS
contains a collection of terminals and non-terminals.
Start symbol: It is the head of the production stated first in the grammar.

2. LANGUAGE GENERATED BY A GRAMMAR

Given a grammar G, its corresponding language L(G) represents the set of all strings generated
from G. Consider the following grammar,
G: S-> aSb|ε
In this grammar, using S-> ε, we can generate ε. Therefore, ε is part of L(G). Similarly, using
S=>aSb=>ab, ab is generated. Similarly, aabb can also be generated.

2
www.gradeup.co

Therefore,
L(G) = {anbn, n>=0}
In language L(G) discussed above, the condition n = 0 is taken to accept ε.
Key Points –
• For a given grammar G, its corresponding language L(G) is unique.
• The language L(G) corresponding to grammar G must contain all strings which can be
generated from G.
• The language L(G) corresponding to grammar G must not contain any string which can not
be generated from G.
• We’ll consider some languages and convert it into a grammar G which produces those
languages.
Example: Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the
grammar G which produces L(G).
Solution:
• Since L(G) = {am bn | m ≥ 0 and n > 0}
• the set of strings accepted can be rewritten as −
• L(G) = {b, ab,bb, aab, abb, …….}
• Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’ including
null.
• To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions −
• S → aS , S → B, B → b and B → bB
• S → B → b (Accepted)
• S → B → bB → bb (Accepted)
• S → aS → aB → ab (Accepted)
• S → aS → aaS → aaB → aab(Accepted)
• S → aS → aB → abB → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated by the
production set.
Hence the grammar:
G: ({S, A, B}, {a, b}, S, { S → aS | B , B → b | bB })

3
www.gradeup.co

3. DERIVATION OF A STRING

A derivation is basically a sequence of production rules, in order to get the input string.
To decide which non-terminal to be replaced with production rule, we can have two options.
3.1. Left-most Derivation:
If the sentential form of an input is scanned and replaced from left to right, it is called
left-most derivation. The sentential form derived by the left-most derivation is called the
left-sentential form.
3.2. Right-most Derivation:
If we scan and replace the input with production rules, from right to left, it is known as
right-most derivation. The sentential form derived from the right-most derivation is called
the right-sentential form.

4. TYPES OF GRAMMAR

4.1. Based on the number of strings:


i. Left Recursive Grammar:
In a grammar G, if there is a production in the form:
A → Ab|B, where A,B is a non-terminal and ‘b’ is a string of terminals, it is called a left
recursive production.
• A production of grammar is said to have left recursion if the leftmost variable of its
RHS is same as variable of its LHS.
• The grammar having a left recursive production is called a left recursive grammar.
Example 3: S → Sb/∈
Where S is the non-terminal.
During parsing in the syntax analysis part of compilation there is a chance that the
grammar will create infinite loop, due to the presence of left recursion. This is because at
every time of production of grammar S will produce another S without checking any
condition.
Elimination of Left Recursion:
Left recursion is eliminated by converting the grammar into a right recursive grammar.
Let the left-recursive pair of productions be:
A → Aα / β
where β does not begin with an A.
Then, left recursion can be eliminated by replacing the pair of productions with:
A → βA’
A’ → αA’ / ∈
The right recursive grammar produced functions same as left recursive grammar.

4
www.gradeup.co

To remove indirect left recursion:


Let us take an example:
𝐒 → 𝐀𝐚|𝐛
𝐀 → 𝐀𝐜|𝐒𝐝| ∈
Step 1: Find the paths where indirect recursion happens. Here indirect recursion A→Sd
happening in this production.
Step 2: Put all the production of S in that variable as
A→Ac|Aad|bd|ϵ
Step 3: Follow the normal procedure to remove direct left recursion.
S→Aa|b
A→bdA′|A′
A′→cA′|adA′|ϵ
Example 1: Consider the grammar given below and eliminate left recursion from
it:
S → SS+ |SS*|a
Sol: First eliminate left recursion for:
S-> SS*|a
We get,
S-> aS’
S’-> S*S’| ϵ
Then remove for,
S->SS+|a
S->aS’
S’->S+S’|ϵ
Then combine those two:
S ->aS’
S’ -> S+S’|S*S’|ϵ
Example 2: Consider the grammar given below and eliminate left recursion from
it:
S → Aa / b
A → Ac / Sd / ∈
Sol: This is a case of indirect left recursion.
The production S → Aa / b is already free from left recursion.
Substituting the productions of S in A → Sd, we get the following grammar:
S → Aa / b
A → Ac / Aad / bd / ∈
Now, eliminating left recursion from the productions of A:
S → Aa / b

5
www.gradeup.co

A → bdA’ / A’
A’ → cA’ / adA’ / ∈
ii. Right Recursive Grammar:
• If the rightmost variable of its RHS is same as variable of its LHS then the grammar is
said to have right recursion.
• A grammar containing a production having right recursion is called as Right Recursive
Grammar.
• It is of the form A →bA | B, where A, B are non-terminals and b is a terminal.
• Right recursion does not create any problem for the Top down parsers.
• Therefore, there is no need of eliminating right recursion from the grammar.
4.2. Based on the number of derivation trees:
Derivation Tree:
• It is also known as parse tree.
• Parse tree is the graphical representation of symbol. The symbol can be terminal or
non-terminal.
• In parsing, the string is derived using the start symbol. The root of the parse tree is
that start symbol.
• For creating a parse tree follow the following points:
All leaf nodes must be terminals.
All interior nodes must be non-terminals.
In-order traversal gives original input string.
Example 3: S → S + S | S * S
S → a|b|c
Input string: a*b+ c
Step1:

Step 2:

6
www.gradeup.co

Step 3:

Step 4:

Step 5:

i. Ambiguous Grammar:
• A grammar is said to be ambiguous if for at least one string generated by it, it produces
more than one parse tree or derivation tree or syntax tree or leftmost derivation or
rightmost derivation.
• For ambiguous grammar, leftmost derivation and rightmost derivation represents
different parse trees.
Example 4: Consider a grammar G is given as follows:
S → AB | aaB
A → a | Aa
B→b
Determine whether the grammar G is ambiguous or not.

7
www.gradeup.co

Solution:
Let the string be "aab"

As there are two different parse trees for deriving the same string, the given grammar
is ambiguous.
ii. Unambiguous Grammar:
• A grammar is said to be unambiguous if for all the strings generated by it, it produces
exactly one parse tree or derivation tree or syntax tree or leftmost derivation or
rightmost derivation.
• For unambiguous grammar, leftmost derivation and rightmost derivation represents
the same parse tree.

5. LEFT FACTORING

• If RHS of more than one production starts with the same symbol, then such a grammar is
called as Grammar with Common Prefixes.
• A grammar with common prefixes is transformed to make it useful is called as Left factored
grammar.

Example 5: Perform left factoring in the following grammar:


A → aAB / aBc / aAc
Sol:
A → aA’
A’ → AB / Bc / Ac
This is a grammar with common prefixes. So we perform left factoring.
A → aA’
A’ → AD / Bc
D→B/c

8
www.gradeup.co

This is a left factored grammar.


****

9
www.gradeup.co

10

You might also like