Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

Unit 4: CFG

4.1. Introduction to Context Free Grammar (CFG), Components of CFG,


Use of CFG, Context Free Language (CFL)

4.2. Types of derivations: Bottomup and Topdown approach, Leftmost and


Rightmost, Sentential Form (Left, Right), Language of a grammar

4.3. Parse tree and its construction, Ambiguous grammar, Use of parse tree
to show ambiguity in grammar, Inherent Ambiguity

4.4. Regular Grammars: Right Linear and Left Linear, Equivalence of


regular grammar and finite automata

4.5. Simplification of CFG: Removal of Useless symbols, Nullable Symbols,


and Unit Productions, Chomsky Normal Form (CNF), Greibach Normal
Form (GNF), Backus-Naur Form (BNF)

4.6. Context Sensitive Grammar, Chomsky Hierarchy(Type 0, 1, 2, 3) ,


Pumping Lemma for CFL, Application of Pumping Lemma, Closure
Properties of CFL
Recall:

Theorem: L := {0n 1n : n ≥ 0} is not regular

But it is often needed to recognize this language


Example: Programming language syntax have
matching brackets, not regular.

Next: Introduce context-free languages


Why study context-free languages

● Practice with more powerful model

● Programming languages: Syntax of C++, Java, etc.


is specified by context-free grammar

● Other reasons: human language has structures


that can be modeled as context-free language
English is not a regular language
Context Free Grammar
It is defined by 4 tuple, G = (T, N, S, P) where,
• T is set of terminals (lexicon)
• N is set of non-terminals/Variables
• S is start symbol (one of the non-terminals)
• R is rules/productions of the form X → γ, where X is a
non-terminal and γ is a sequence of terminals and
non-terminals (may be empty).
• A grammar G generates a language L.
An example context-free grammar
G = (T, N, S, R)
T = {that, this, a, the, man, book, flight, meal, include, read, does}
N = {S, NP, NOM, VP, Det, Noun, Verb, Aux}
S=S
R={
S → NP VP Det → that | this | a | the
S → Aux NP VP Noun → book | flight | meal | man
S → VP Verb → book | include | read
NP → Det NOM Aux → does
NOM → Noun
NOM → Noun NOM
VP → Verb
VP → Verb NP
}
Application of grammar rules
RULES: S → NP VP Det → that | this | a | the
S → Aux NP VP Noun → book | flight | meal | man
S → VP Verb → book | include | read
NP → Det NOM Aux → does
NOM → Noun
NOM → Noun NOM
VP → Verb
VP → Verb NP
Example:
S → NP VP
→ Det NOM VP
→ The NOM VP
→ The Noun VP
→ The man VP
→ The man Verb NP
→ The man read NP
→ The man read Det NOM
→ The man read this NOM
→ The man read this Noun
→ The man read this book
Example: CFG defining the grammar of all the strings with
equal no of 0’s followed by equal no of 1’s

Rules:
Sà Є
Sà 0S1
Here, the two rules define the production P,
Є, 0, 1 are the terminals defining T,
S is a variable symbol defining V
And S is start symbol from where production starts.
CFG Vs RE
• The CFG are more powerful than the regular expressions as
they have more expressive power than the regular
expression.
• Generally regular expressions are useful for describing the
structure of lexical constructs as identical keywords,
constants etc. But they do not have the capability to specify
the recursive structure of the programming constructs.
• However, the CFG are capable to define any of the
recursive structure also. Thus, CFG can define the
languages that are regular as well as those languages that
are not regular.
Compact Notation for the Productions

Sà Є | a | b
The detail of this production look like as
Sà Є
Sàa
Sàb
Meaning of context free:
Consider an example:
SàaMb
Mà A | B
Aà Є | aA
Bà Є | bB
How?
Consider a String aaAb, which is an intermediate stage in the
generating of aaab. It is natural to call the strings “aa” and “b”
that surround the symbol A, the “context” of A in this particular
string. Now, the rule Aà aA says that we can replace A by the
string aA no matter what the surrounding strings are; in other
words, independently of the context of A.
context-sensitive grammar (CSG)
A context-sensitive grammar (CSG) is a formal grammar type
that is more expressive than context-free grammars (CFGs).
In a context-sensitive grammar, the production rules are
applied based not only on the current non-terminal symbol
being replaced but also on the surrounding context or
neighboring symbols. This allows for a more sophisticated
and flexible representation of languages compared to
context-free grammars.

Example:
Consider a simple context-sensitive grammar that generates
strings of the form a^n b^n c^n, where the number of 'a's,
'b's, and 'c's are the same.
Example of context-sensitive grammar (CSG):
Consider a simple context-sensitive grammar that generates
strings of the form a^n b^n c^n, where the number of 'a's, 'b's,
and 'c's are the same.
Solution:
Production rules:
1.XaY -> aXY (replace 'a' in the middle with 'a' on the left)
2.YbZ -> YZb (replace 'b' in the middle with 'b' on the right)
3.ZcW -> ZWc (replace 'c' in the middle with 'c' on the right)
4.aX -> Xa (remove 'a' from the left end)
5.bY -> Yb (remove 'b' from the left end)
6.cZ -> Zc (remove 'c' from the left end)
7.X -> ε (remove 'X', where ε is the empty string)
8.Y -> ε (remove 'Y')
9.Z -> ε (remove 'Z')
10.W -> ε (remove 'W')
Starting symbol: XYZ
There are two possible approaches of derivation:
• Body to head(Bottom Up) approach.
• Head to body(Top Down) approach(Leftmost/ Rightmost derivation).

Body to head
Here, we take strings known to be in the language of each of
the variables of the body, concatenate them, in the proper
order, with any terminals appearing in the body, and infer that
the resulting string is the language of the variables in the head.

Head to Body
Here, we use production from head to body. We expand the
start symbol using a production, whose head is the start
symbol. Here we expand the resulting string until all strings of
terminal are obtained. Here we have two approaches:
Body to head
Consider grammar,
SàS + S
Sà S/S
Sà (S)
SàS-S
SàS*S
Sàa

suppose string be:: a + (a*a) / a – a


SN String Variable(For Language Production String Used
Inferred of)
1 a S Sàa
2 a*a S SàS*S String 1
3 (a*a) S Sà(S) String 2
3 (a*a)/a S SàS/S String 3 and
string 1
4 a+(a*a)/a S SàS+S String 1 and 3
5 a+(a*a)/a-a S SàS-S String 4 and 1
Head to Body
Here we have two approaches:

Leftmost Derivation: Here leftmost symbol


(variable) is replaced first.
Rightmost Derivation: Here rightmost symbol is
replaced first.

For example: consider the previous example of


deriving string
a+ (a*a)/a-a with the grammar(2)
Leftmost derivation for the given string : a+ (a*a)/a-a

Sà S + S rule à S + S
Sàa + S rule Sà a [Note here we have
replaced left symbol of the body]
Sàa + S – S rule SàS - S
Sàa + S / S – S rule Sà S / S
Sàa + (S) /S – S rule Sà(S)
Sà(S*S) / S – S; rule SàS*S
Given Grammer
Sà a + (a*S) / S – S rule Sàa SàS + S
Sà a + (a*a) / S – S “ “ Sà S/S
Sà a + (a*a) / a – S “ “ Sà (S)
Sàa + (a*a) / a – a “ “ SàS-S
SàS*S
Sàa
Rightmost derivation for the given string : a+ (a*a)/a-a

Sà S – S; rule SàS - S
Sà S – a; rule Sà a
SàS + S – a; rule Sà S + S
SàS + S / S – a; rule SàS / S
SàS + S / a –a; rule Sà a
SàS + (S) / a – a; rule Sà (S)
SàS + (S*S) / a – a; rule Sà S*S
SàS + (S*a) / a – a; rule Sà a Given Grammer
SàS + S
SàS + (a*a) / a – a;
Sà S/S
Sà a + (a*a) / a – a; Sà (S)
SàS-S
SàS*S
Sàa
Direct Derivation:
α1à α2 : If α2 can be derived directly from α1, then it is direct
derivation.
α1à* α2 : If α2 can be derived from α1 with zero or more
steps of the derivation, then it is just derivation.

Example
SàaSa | ab| a| b| Є

Direct derivation:
S à ab
Sà aSa
Sà aaSaa
Sàaaabaa

Thus Sà* aaabaa is just a derivation.


Example leftmost derivation ::
Let any set of production rules in a CFG be
X → X+X | X*X |X| a over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be −
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The stepwise derivation of the above string is shown as below −
Example rightmost derivation ::
Let any set of production rules in a CFG be
X → X+X | X*X |X| a over an alphabet {a}.
The rightmost derivation for the string "a+a*a" may be −
X → X*X → X*a → X+X*a → X+a*a → a+a*a
The stepwise derivation of the above string is shown as below −
Language of Grammar (Context Free Grammar):

Let G = (V, T, P and S) is a context free grammar. Then


the language of G denoted by L(G) is the set of terminal
strings that have derivation from the start symbol in G.

i.e. L (G) = {x ε T* | Sà* x}

The language generated by a CFG is called the Context


Free Language (CFL).

A language is called context free if it is generated by some context


free grammar. For example, the language anbn is context free.
Not all languages are context free. For example, anbncn is not.
Sentential forms
Derivations from the start symbol produce strings that
have a special role.
We call these “sentential forms”.

i.e. if G=(V,T,P,S) is a CFG,


then any string α is in (V U T)* such that Sà* α is a
sentential form.

If Sàlm* α, then α is a left sentential forms, and if Sàrm* α,


then α is a right sentential form.

Note: The language L(G) is those sentential forms that are


in T* i.e. they consist solely of terminals.
Derivation Tree / Parse Tree:
Parse tree is a tree representation of strings of terminals
using the productions defined by the grammar.
A parse tree pictorially shows how the start symbol of a
grammar derives a string in the language.

Parse tree may be viewed as a graphical representation


for a derivation that filters out the choice regarding the
replacement order.
Example
Consider the grammer
Sà aSa | a | b| Є

Now for string Sà* aabaa


We have,
Sà aSa
SàaaSaa
Sàaabaa So the parse tree isà:
CFG: Parsed Tree

NOTE: FA/Regular expression can not represent anything that has


condition like anbn (here condition is a and b count must me equal)
but ambn is regular if there is no any condition like n=m,n!=m etc., i.e.
if there is no dependency of m,n then ambn are regular and can be
represented by FA.
Construct a CFG that generates language of balanced
parentheses. Show the parse tree computation for
1.( ) ( )
2.( ( ( ) ) ) ( )
Solution:
The CFG is G=(V,T,P,S)
Sà SS
Sà(S)
Sà Є
Where,
V = {S}
S = {S}
T = {(,)}
& P = as listed above

Now the parse tree for ( ) ( ) is => YOURSELF


And the parse tree for ( ( ( ( ) ) ) ) ( ) => YOURSELF
Exercise

a)Consider the Grammar G:


Sà A1B
Aà0A | Є
Bà0B | 1B | Є
1.Construct the parse tree for 00101
2.Construct the parse tree for 1001
3.Construct the parse tree for 00011

b) Consider the grammer for the arithematic expression


Eà EOPE | (E) | id
OPà + | - | * | / | ↑

Construct the parse tree for id + id *id


Construct the parse tree for (id +id)*(id + id)
Ambiguity in Grammar:
A Grammar G = (V, T, P and S) is said to be ambiguous if there is
a string w ε L (G) for which we can derive two or more distinct
derivation tree rooted at S and yielding w. In other words, a
grammar is ambiguous if it can produce more than one leftmost
or more than one rightmost derivation for the same string in
the language of the grammar.

Example:
Sà AB |aaB
Aà a | Aa
Bàb
For any string aab;
We have two leftmost derivations as;
SàAB the parse tree for this derivation is:
àAaB
àaaB
à aab
Also,
Sà aaB the parse tree for this derivation is:
àaab
Example: Check whether the grammar G with production rules −
X → X+X | X*X |X| a is ambiguous or not.
Solution
Let’s find out the derivation tree for the string "a+a*a". It has two leftmost
derivations.
Derivation 1 − Derivation 2 −
X → X+X → a +X X → X*X → X+X*X → a+ X*X
→ a+ X*X → a+a*X → a+a*X → a+a*a
→ a+a*a Parse tree 2 −
Parse tree 1 −

Since there are two parse trees for a single string "a+a*a", the
grammar G is ambiguous.
CFG Simplification
In a CFG, it may happen that all the production rules
and symbols are not needed for the derivation of
strings. Besides, there may be some null productions
and unit productions. Elimination of these productions
and symbols is called simplification of CFGs.
Simplification essentially comprises of the following
steps −
• A. Reduction of CFG /(Eliminate useless symbol):
• B. Removal of Unit Productions
• C. Removal of Null Productions
Goal of CSG Simplification::
The goal of CFG Simplification is to show that every context free
language (without Є) is generated by a CFG in which all production are
of the form
Aàa
A à BC
S→ε
where A,B and C are variables(V), and a is terminal(T).
This form is called Chomsky Normal Form.

To get there, we need to make a number of preliminary simplifications,


which are themselves useful in various ways;

A. Reduction of CFG:: We must eliminate “useless symbols”: Those


variables or terminals that do not appear in any derivation of a terminal
string from the start symbol.
B. Removal of Unit Productions:: We must eliminate “unit
production”: Those of the form A à B for variables A and B.
C. Removal of Null Productions:: We must eliminate “Є-production”:
Those of the form Aà Є for some variable A.
A. CFGs are reduced (remove useless) in two phases −
Phase 1 − Derivation of an equivalent grammar, G’, from the
CFG, G, such that each variable derives some terminal string.
Derivation Procedure −
Step 1 − Include all symbols, W1, that derive some terminal
and initialize i=1.
Step 2 − Include all symbols, Wi+1, that derive Wi.
Step 3 − Increment i and repeat Step 2, until Wi+1 = Wi.
Step 4 − Include all production rules that have Wi in it.

Phase 2 − Derivation of an equivalent grammar, G”, from the


CFG, G’, such that each symbol appears in a sentential form.
Derivation Procedure −
Step 1 − Include the start symbol in Y1 and initialize i = 1.
Step 2 − Include all symbols, Yi+1, that can be derived
from Yi and include all production rules that have been applied.
Step 3 − Increment i and repeat Step 2, until Yi+1 = Yi.
Example:: Find a reduced grammar equivalent to the grammar G, having
production rules, P: S → AC | B, A → a, C → c | BC, E → aA | e
Solution::
Phase 1 −
T = { a, c, e }
W1 = { A, C, E } from rules A → a, C → c and E → e| aA (derives some terminal)
W2 = { A, C, E } U { S } from rule S → AC (derives W1 i.e. {A,C,E} )
W3 = { A, C, E, S } U ∅
Since W2 = W3, we can derive G’ as −
G’ = { { A, C, E, S }, { a, c, e }, P, {S}}
where P: S → AC, A → a, C → c , E → aA | e

Phase 2 −
Y1 = { S }
Y2 = { S, A, C } from rule S → AC
Y3 = { S, A, C, a, c } from rules A → a and C → c
Y4 = { S, A, C, a, c }
Since Y3 = Y4, we can derive G” as −
G” = { { A, C, S }, { a, c }, P, {S}}
where P: S → AC, A → a, C → c
Simplification essentially comprises of the following steps ::

A. Reduction of CFG (DONE)

B. Removal of Unit Productions


Any production rule in the form A → B where A, B ∈ Non-terminal is
called unit production..
Removal Procedure −
Step 1 − To remove A → B, add production A → x to the grammar rule
whenever B → x occurs in the grammar. [x ∈ Terminal, x can be Null]
Step 2 − Delete A → B from the grammar.
Step 3 − Repeat from step 1 until all unit productions are removed.

C. Removal of Null Productions


Example: Remove unit production from the following −
S → XY, X → a, Y → Z | b, Z → M, M → N, N → a
Solution ::
There are 3 unit productions in the grammar −
Y → Z, Z → M, and M → N
At first, we will remove M → N.
As N → a, we add M → a, and M → N is removed.
The production set becomes
S → XY, X → a, Y → Z | b, Z → M, M → a, N → a
Now we will remove Z → M.
As M → a, we add Z→ a, and Z → M is removed.
The production set becomes
S → XY, X → a, Y → Z | b, Z → a, M → a, N → a
Now we will remove Y → Z.
As Z → a, we add Y→ a, and Y → Z is removed.
The production set becomes
S → XY, X → a, Y → a | b, Z → a, M → a, N → a
Now Z, M, and N are unreachable, hence we can remove those.
The final CFG is unit production free −
S → XY, X → a, Y → a | b
Simplification essentially comprises of the following steps ::

A.Reduction of CFG

B. Removal of Unit Productions

C.Removal of Null Productions


In a CFG, a non-terminal symbol ‘A’ is a nullable variable if there is a
production A → ε or there is a derivation that starts at A and finally ends
up with ε: A → .......… → ε
Removal Procedure
Step 1 − Find out nullable non-terminal variables which derive ε.
Step 2 − For each production A → a, construct all productions A →
x where x is obtained from ‘a’ by removing one or multiple non-terminals
from Step 1.
Step 3 − Combine the original productions with the result of step 2 and
remove ε - productions.
Example: Remove null production from the following −
S → ASA | aB | b, A → B, B → b | ∈
Solution::
There are two nullable variables − A and B
At first, we will remove B → ε.
After removing B → ε, the production set becomes −
S→ASA | aB | b | a, A → B | ε, B → b
Now we will remove A → ε.
After removing A → ε, the production set becomes −
S→ASA | aB | b | a | SA | AS | S, A → B, B → b
This is the final production set without null transition.

NOTE: we still can remove unit production for completiong


simplification
Chomsky Normal Form (CNF)

A CFG is in Chomsky Normal Form if the Productions are


in the following forms −
A→a
A → BC
S→ε
where A, B, and C are non-terminals and a is terminal.
Examples for CFG Simplifications::
Example :Eliminating Useless Symbols:
Consider a grammar defined by following productions:
Sà aB | bX
Aà Bad | bSX | a
Bà aSB | bBX
Xà SBd | aBX | ad
Here, A and X can directly generate terminal symbols. So, A and X are
generating symbols. As we have the productions Aà a and Xà ad.
Also, Sà bX and X generates terminal string so S can also generate
terminal string. Hence, S is also generating symbol.
B can not produce any terminal symbol, so it is non-generating.
Hence, the new grammar after removing non-generating symbols is:
Sà bX
Aà bSX | a
Xà ad
Here, A is non-reachable as there is no any derivation of the form Sà*
α A β in the grammar. Thus eliminating the non-reachable symbols, the
resulting grammar is:
Sà bX
Xà ad
Exercise
Remove useless symbol from the following
grammar:
Sà xyZ | XyzZ
Xà Xz | xYZ
Yà yYy | XZ
Zà Zy | z

Remove useless symbol from the following grammar


Sà aC | SB
Aà bSCa
Bà aSB | bBC
Cà aBc | ad
Example Eliminating Є-productions:
Consider the grammar:
SàABC
Aà BB | Є
Bà CC | a
Cà AA | b
Here,
Aà Є A is nullable.
CàAAà* Є, C is nullable
Bà CCà* Є, B is nullable
SàABCà* Є, S is nullable
Now for removal of Є –production:
In production SàABC, all A, B and C are nullable. So, striking out subset of
each the possible combination of production gives new productions as:
Sà ABC | AB | BC | AC | A | B | C
Similarly for other can be done and the resulting grammar after removal of e-
production is:
SàABC | AB | BC | AC | A | B | C AàBB | B
BàCC | C | a CàAA | A | b
Exercise:
Remove Є-productions for each of grammar;
1)
Sà AB
Aà aAA | Є
B à bBB | Є
Example Eliminating Unit Production:
Remove the unit production for grammar G defined by productions:
P = {Sà S + T | T
Tà T* F | F
Fà (S) | a 3) Now, add each non-unit productions
}; of the form B àα for each pair (A, B);
1) Initialize P’ = {
P’ = {Sà S + T | T SàS + T |T * F| (S) | a
Tà t*F | F TàT * F | (S) | a | F
Fà (S) | a } Fà (S) | a
2) Now, find unit pairs; }
Here, Sà T So, (S, T) is unit pair. 4) Delete the unit productions from the
Tà F So, (T, F) is unit pair. grammar;
Also, SàT and Tà F P’ = {
So, (S, F) is unit pair. Sà S + T | T * F | (S) | a
Tà T * F | (S) | a
Fà (S) | a
}
Exercise
1) Simply the grammar G = (V, T, P, S) defined by following
productions.
Sà ASB | Є
Aà aAS | a
Bà SbS | A | bb | Є
Note: Here simplify means you have to remove all the useless symbol, Unit
production and Є- productions.
2) Simplify the grammar defined by following production:
Sà 0A0 | 1B1 | BB
Aà C
Bà S | A
CàS | Є
Chomsky Normal Form

A context free grammar G = (V, T, P, S) is said to be in


Chomsky’s Normal Form (CNF) if every production in G
are in following forms;
Aà a
Aà BC
S→ε
where A, B, C ε V and a ε T

Thus a grammar in CNF is one which should not have;


Є-production
Unit production
Useless symbols.
Algorithm to Convert into Chomsky Normal Form −
Step 1 − If the start symbol S occurs on some right side,
create a new start symbol S’ and a new production S’→ S.
Step 2 − Remove Null productions. (Using the Null production
removal algorithm discussed earlier)
Step 3 − Remove unit productions. (Using the Unit production
removal algorithm discussed earlier)
Step 4 − Replace each production A → B1…Bn where n >
2 with A → B1C where C → B2 …Bn. Repeat this step for all
productions having two or more symbols in the right side.
Step 5 − If the right side of any production is in the form A →
aB where a is a terminal and A, B are non-terminal, then the
production is replaced by A → XB and X → a. Repeat this
step for every production which is in the form A → aB.
Example: Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε
Solution::
(1) Since S appears in R.H.S, we add a new
state S0 and S0→S is added to the production set and it
becomes −
S0→S, S→ ASA | aB, A → B | S, B → b | ∈

(2) Now we will remove the null productions −


B → ∈ and A → ∈
After removing B → ε, the production set becomes −
S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b
After removing A → ∈, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b

(3) Now we will remove the unit productions.


S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b

(3) Now we will remove the unit productions.


After removing S → S, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b
After removing S0→ S, the production set becomes −
S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → B | S, B → b
After removing A→ B, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A→S|b
B→b
After removing A→ S, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA, B → b

(4) Now we will find out more than two variables in the R.H.S
(4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals in
R.H.S.
Hence we will apply step 4 and step 5 to get the following final
production set which is in CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA

(5) We have to change the productions S0→ aB, S→ aB, A→ aB


And the final production set becomes −
S0→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a
COMPLETED.
Greibach Normal Form
A CFG is in Greibach Normal Form if the Productions are
in the following forms −
A→b
A → bD1…Dn
S→ε
where A, D1,....,Dn are non-terminals and b is a terminal.
Algorithm to Convert a CFG into Greibach Normal Form
Step 1 − If the start symbol S occurs on some right side, create a new start
symbol S’ and a new production S’ → S.
Step 2 − Remove Null productions. (Using the Null production removal
algorithm discussed earlier)
Step 3 − Remove unit productions. (Using the Unit production removal
algorithm discussed earlier)
Step 4 − Remove all direct and indirect left-recursion.
Step 5 − Do proper substitutions of productions to convert it into the proper
form of GNF.
Example: Convert the following CFG into GNF
S → XY | Xo | p
X → mX | m
Y → Xn | o
Solution::
Here, S does not appear on the right side of any production and there are
no unit or null productions in the production rule set. So, we can skip
Step 1 to Step 3.
Step 4
Now after replacing
X in S → XY | Xo | p with X → mX | m we obtain,
S → mXY | mY | mXo | mo | p.
And after replacing X in Y → Xn | o with the right side of X → mX | m
Y → mXn | mn | o
Two new productions O → o and N → n are added to the production set
and then we came to the final GNF as the following −
S → mXY | mY | mXO | mO | p
X → mX | m
Y → mXN | mN | o
O→o
Consider an example:
Sà AAC Aà aAb | Є C à aC | a
1) Now, removing Є- productions;
Here, A is nullable symbol as AàЄ
So, eliminating such Є-productions, we have;
Sà AAC | AC | C Aà aAb | ab Cà aC | a
2) Removing unit-productions;
Here, the unit pair we have is (S, C) as Sà C So, removing unit-production, we have;
Sà AAC | AC | aC| a Aà aAb | ab Cà aC | a
Here we do not have any useless symbol. Now, we can convert the grammar to CNF. For this;
àFirst replace the terminal by a variable and introduce new productions for those which are not as the
productions in CNF.
i.e. SàAAC | AC |C1C | a C1à a Aà C1AB1 | C1B1 B1à b Cà C1C | a
Now, replace the sequence of non-terminals by a variable and introduce new productions.
Here, replace Sà AAC by SàAC2, C2à AC
Similarly, replace Aà C1AB1 by Aà C1C3, C3à AB1
Thus the final grammar in CNF form will be as:
Thus the final grammar in CNF form will be as;
Sà AC2 | AC | C1C | a
Aà C1C3 | C1b1
C1à a
B1à b
C2à AC
C3à AB1
Cà C1C | a
Exercise::Simplify following grammars and convert to CNF;
1)
SàASB | Є
Aà aAS | a
Bà SbS | A | bb

2) Sà AACD
Aà aAb | Є
Cà aC | a
DàaDa | bDa | Є

3)
Sà aaaaS | aaaa
Bakus-Naur Form
This is another notation used to specify the CFG. It is named
so after John Bakus, who invented it, and Peter Naur, who
refined it. The Bakus-Naur form is used to specify the
syntactic rule of many computer languages, including Java.
Here, concept is similar to CFG, only the difference is instead
of using symbol “à’’ in production, we use symbol ::=. We
enclose all non-terminals in angle brackets, < >.

For example:

The BNF for identifiers is as;


<identifier> ::= <letter or underscore> | <identifier> | <symbol>
<letter or underscore> ::= <letter> | <_>
<symbol> ::= <letter or underscore> | <digit>
<letter> ::= a | b | ……………….. | z
<digit> ::= 0 | 1| 2 | ……………….|3
Left recursive Grammar
A grammar is said to be left recursive if it has a non-terminal
A such that there is a derivation (A à* A α) for some string x.
The top down parsing methods can not handle left recursive
grammars, so a transformation that eliminates left recursion
is needed.

Removal of immediate left recursion:


In general,
If Aà A α1 |A α2 | ….|A αm | β1 | β2 | …| βn|; With βi does not start with A.

Then
we can remove left recursion as:
Aà β1A’ | β2A’| …. | βnA’
A’à α1A’ | α2A’ | ... | αmA’ | Є

Equivalently, these productions can be rewritten as:


Aà β1A’ | β2A’| …. | βnA’ | β1 | β2 |….| βn
A’à α1A’ | α2A’ |.... | αmA’ | α1 | α2|....| αm
Example Removal of immediate left recursion:
Given grammar,
SàAA | 0
Aà AAS | 0S | 1 (Where AS is of the form = α1, 0S of the form
=β1 and 1 of the form=β2)
Solution:
No left recursion on SàAA | 0
But, the production Aà AAS is immediate left recursive. So
removing left recursion,
we have;
SàAA | 0
Aà0SA’ | 1A’ (NOTE: Aà β1A’ | β2A’| …. | βnA’)
A’ à ASA’ | Є (NOTE: α1A’ | α2A’ | ... | αmA’ | Є)
Equivalently, we can write it as:
Sà AA | 0
Aà 0SA’ | 1A’ | 0S | 1A’
A’à ASA’ | AS
Exercise
For the following grammar
EàE+T |T
Tà T*F |F
Fà (E) | a
Remove the Left recursion.
Greibach Normal Form (GNF)
A grammar G = (V, T, P and S) is said to be in Greibach
Normal Form, if all the productions of the grammar are of the
form:
Aà a α, where a is a terminal, i.e. a ε T* and α is a string of
zero or more variables. i.e. α ε V*
So we can rewrite as:
AàaV* with a ε T*
Or,
Aà aV+
Aà a with a ε T

This form called Greibach Normal Form, after Sheila


Greibach, Who first gave a way to construct such grammars.
Converting a grammar to this form is complex, even if we
simplify the task by, say, starting with a CNF grammar.
To convert a grammar into GNF;
• Convert the grammar into CNF at first.
• Remove any left recursions.
• Let the left recursion tree ordering is A1, A2, A3,
……………… Ap
• Let Ap is in GNF
• Substitute Ap in first symbol of Ap-1, if Ap-1 contains
Ap. Then Ap-1 is also in GNF.
• Similarly substitute first symbol of Ap-2 by Ap-1
production and Ap production and so on…………
Convert following grammar to GNF:
SàAA | 0
AàSS | 1
This Grammar is already in CNF (useless,null,unit production already removed)
Now to remove left recursion, first replace symbol of A-production by S-
production (since we do not have immediate left recursion) as:
Sà AA | 0
Aà AAS | 0S | 1 (Where AS is of the form = α1, 0S of the form =β1 and 1of the
form=β2)
Now, removing the immediate left recursion;
SàAA | 0
Aà0SA’ | 1A’
A’à ASA’ | Є
Equivalently, we can write it as:
Sà AA | 0
Aà 0SA’ | 1A’ | 0S | 1
A’àASA’ | AS
Now we replace first symbol of S-production by A-production as;
Sà 0SA’A | 1A’A | 0SA | 1A | 0
Aà 0SA’ | 1A’ | 0S | 1
A’à ASA’ | AS
Similarly replacing first symbol of A’-production by A-production, we get the
grammar in GNF as;
Sà0SA’A | 1A’A | 0SA | 1A | 0
Aà 0SA’ | 1A’ | 0S | 1
A’à 0SA’SA’ | 1A’SA’ | 0SSA’ | 1SA’. | 0SA’S | 1A’S | 0SS | 1
This is in GNF.
Closure Property of Context Free Languages
A. The context free language are closed under union
B. The CFLs are closed under concatenation
C. The CFLs are closed under kleen closure

Context-free languages are closed under −


Union
Concatenation
Kleene Star operation

However context free languages are not closed under some


cases like, intersection and complementation.
The context free languages are not closed under intersection.
Context-free languages are closed under −
Union
Let L1 and L2 be two context free languages. Then L1 ∪ L2 is also
context free.
Example
Let L1 = { anbn , n > 0}. Corresponding grammar G1 will have P: S1 →
aAb|ab
Let L2 = { cmdm , m ≥ 0}. Corresponding grammar G2 will have P: S2 →
cBb| ε
Union of L1 and L2, L = L1 ∪ L2 = { anbn } ∪ { cmdm }
The corresponding grammar G will have the additional production S →
S1 | S2

Concatenation
If L1 and L2 are context free languages, then L1L2 is also context free.
Example
Union of the languages L1 and L2, L = L1L2 = { anbncmdm }
The corresponding grammar G will have the additional production S →
S1 S2
Kleene Star
If L is a context free language, then L* is also context free.

Example
Let L = { anbn , n ≥ 0}. Corresponding grammar G will have P: S → aAb|
ε
Kleene Star L1 = { anbn }*
The corresponding grammar G1 will have additional productions S1 →
SS1 | ε
Context-free languages are not closed under −
Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is
not necessarily context free.
Intersection with Regular Language − If L1 is a regular language and
L2 is a context free language, then L1 ∩ L2 is a context free language.
Complement − If L1 is a context free language, then L1’ may not be
context free.
Regular Grammar:
A regular grammar represents a language that is accepted by some
finite automata called regular language. A regular grammar is a CFG
which may be either left or right linear. A grammar in which all of the
productions are of the form Aà wB (wB is of the form α) or Aà w for
A, B ε V and w ε T* is called left linear.
Equivalence of Regular Grammar and Finite Automata
1. Let G = ( V, T, P and S) be a right linear grammar of a language
L(G). we can construct a finite automata M accepting L(G) as;
M = ( Q, T, δ, [S]. {[Є]})

Where,
Q – consists of symbols [α] such that α is either S or a suffix from right
hand side of a production in P.
T - is the set of input symbols which are terminal symbols of G.
[S] – is a start symbol of G which is start state of finite automata.
{[Є]} – is the set of final states in finite automata.
And δ is defined as:
If A is a variable then, δ([A], Є) = [α] such that Aà α is a production.
If ‘a’ is a terminal and α is in T* U T*V then δ([aα], a) = [α].
For example;
Sà 00B | 11C | Є
Bà 11C | 11
Cà00B | 00
Now the finite automata can be configured as;
Exercise
Write the equivalent finite automata

1.Sà abA | bB | aba Aà b | aB | bA BàaB | aA


2.Sà0B | 111C | Є Bà 00C | 11 Cà 00B | 0D
Dà0B | 00
3. Sà 0A | 1B | 0 | 1 Aà 0S | 1B | 1 Bà0A | 1S
4. Aà 0B | 1D | 0 Bà0D | 1C | Є Cà0B | 1D | 0
Dà0D | 1D
5. Aà0B | E | 1C Bà0A | F | Є Cà0C | 1A | 0
Eà0C | 1A | 1 Fà0A | 1B | Є
Pumping Lemma for CFG::
If L is a context-free language, there is a pumping length p such that any
string w ∈ L of length ≥ p can be written as w = uvxyz, where vy ≠
ε, |vxy| ≤ p, and for all i ≥ 0, uvixyiz ∈ L.

Applications of Pumping Lemma


Pumping lemma is used to check whether a grammar is context free or
not.

The “pumping lemma for context free languages” says that in any
sufficiently long string in a CFL, it is possible to find at most two short,
nearby substrings that we can “pump” in tandem (one behind another).
i.e. we may repeat both of the strings I times, for any integer I, and the
resulting string will still be in the language. We can use this lemma as a
tool for showing that certain languages are not context free.
Pumping lemma for context free Language:
The “pumping lemma for context free languages” says that
in any sufficiently long string in a CFL, it is possible to find
at most two short, nearby substrings that we can “pump” in
tandem (one behind another). i.e. we may repeat both of
the strings I times, for any integer I, and the resulting string
will still be in the language. We can use this lemma as a
tool for showing that certain languages are not context free.

Size of Parse tree: Our first step in pumping lemma for


CFL’s is to examine the size of parse trees, During the
lemma, we will use the grammar in CNF form. One of the
uses of CNF is to turn parses into binary trees. Any
grammar in CNF produces a parse tree for any string that is
binary tree. These trees have some convenient properties,
one of which can be exploited by following theorem.
Theorem: Let a terminal string w be a yield of parse tree generated by a grammar G = (V, T, P and S) in
CNF. If length path is n, then |w| ≤2n-1

Proof: Let us prove this theorem by simple induction on n.

Basis Step: Let n = 1, then the tree consists of only the root and a leaf labeled with a terminal.
So, string w is a simple terminal.
Thus |w| = 1 =21-1 = 20 which is true.

Inductive Step: Suppose n is the length of longest path and n>1. Then the root of the tree uses a production
of the form; AàBC, since n>1

No path is the sub-tree rooted at B and C can have length greater than n-1 since B&C are child of A. Thus,
by inductive hypothesis, the yield of these sub-trees are of length almost 2n-2.
The yield of the entire tree is the concatenation of these two yields and therefore has length at most,
2n-2 + 2n-2 = 2n-1
|w|<=2n-1. Hence proved.
Pumping Lemma
It states that every CFL has a special value called pumping length such that all
longer strings in the language can be pumped. This time the meaning of pumped
is a bit more complex. It means the string can be divided into five parts so that
the second and the fourth parts may be repeated together any number of times
and the resulting string still remains in the language.

Theorem: Let L be a CFL. Then there exists a constant n such that if z is any
string in L such that |z| is at least n then we can write z=uvwxy, satisfying
following conditions.
I) | vx| >0
II) |vwx| ≤ n
III) For all i ≥ 0, uviwxiy ε L. i.e. the two strings v & x can be “pumped” nay
number of times, including ‘0’ and the resulting string will be still a member of
L.
Example Show that L = {anbncn:n≥0} is not a CFL
To prove L is not CFL, we can make use of pumping lemma for CFL. Let L be
CFL. Let any string in the language is ambmcm
z = uvwxy, |vwx| ≤ m, |vx| > 0
Case I:
vwx is in am

v = ak1, x = ak2; k1+k2 >0 i.e. k1+k2 ≥ 1


Now pump v and x then

But by pumping lemma, uv2wx2y = am+k1+k2bmcm does not belong to L as,


k1+k2. Hence, it shows contradiction that L is CFL. So, L is not CFL.
Note: other cases can be done similarly.
CFG vs. automata

CFG ⇔ non-deterministic pushdown automata (PDA)

A PDA is simply an NFA with a stack.

q x, y → z q
1 2

This means: “read x from the input;


pop y off the stack;
push z onto the stack”
Any of x,y,z may be ε.
Example: Context-free grammar G, Σ = {0,1}
S→0S1
S→ε

Two substitution rules (a.k.a. productions) →


Variables = {S}, Terminals = {0,1}

Derivation of 0011 in grammar:


S ⇒ 0S1 ⇒ 00S11 ⇒ 0011

L(G) = {0n 1n : n ≥ 0}
Example: Context-free grammar G, Σ = {0,1}
S →A
S→B
A → 0 A1
A→ε
B→1B0
B→ε
L(G) = L(A) U L(B)
= {0n 1n : n ≥ 0} U {1n 0n : n ≥ 0}

Next: A convention to write this more compactly


Example: Context-free grammar G, Σ = {0,1}
S → A |B
A → 0 A 1 |ε
B→1B0|ε

Convention: Write A → w|w' for


A → w and A→ w'
Definition: A context-free grammar (CFG) G is
a 4 tuple (V, Σ, R, S) where
● V is a finite set of variables

Σ is a finite set of terminals (V ∩ Σ = ∅)
● R is a finite set of rules, where each rule
is A → w A ∈ V, w ∈ (V U Σ)*
● S ∈ V is the start variable
Example
The language L = {ambn : m > n}
is described by the CFG G = (V, Σ, R, S)
where:
V = {S, T} Derive aaab:
Σ = {a, b} S→?
R = { S → aS | aT
T → aTb | ε }
Example
The language L = {ambn : m > n}
is described by the CFG G = (V, Σ, R, S)
where:
V = {S, T} Derive aaab:
Σ = {a, b} S → aS
R = { S → aS | aT →?
T → aTb | ε }
Example
The language L = {ambn : m > n}
is described by the CFG G = (V, Σ, R, S)
where:
V = {S, T} Derive aaab:
Σ = {a, b} S → aS
R = { S → aS | aT → aaT
T → aTb | ε } →?
Example
The language L = {ambn : m > n}
is described by the CFG G = (V, Σ, R, S)
where:
V = {S, T} Derive aaab:
Σ = {a, b} S → aS
R = { S → aS | aT → aaT
T → aTb | ε } → aaaTb
→?
Example
The language L = {ambn : m > n}
is described by the CFG G = (V, Σ, R, S)
where:
V = {S, T} Derive aaab:
Σ = {a, b} S → aS
R = { S → aS | aT → aaT
T → aTb | ε } → aaaTb
→ aaab
Definition: Let G = (V, Σ, R, S) be a CFG
we write uAv ⇒ uwv and say uAv yields uwv
if A → w is a rule
We say u derives v, written u ⇒ * v, if
● u = v, or
● ∃ u , u , … , u
1 2 k k≥1
u: ⇒ u1 ⇒ u2 ⇒ … ⇒ uk = v

The language of the grammar is L(G) = {w : S ⇒ * w}


Example: CFG for expressions in programming
languages

Task: recognize strings like 0 + 0 + 1 x (1 + 0)

S → S+S | S x S | ( S ) | 0 | 1

S→ S+S → 0+S → 0+S+S → 0+0+S


→0+0+SxS → 0+0+1xS
→ 0 + 0 + 1 x (S) → 0 + 0 + 1 x (S + S)
→ 0 + 0 + 1 x (1 + S) → 0 + 0 + 1 x (1 + 0)
We have seen: CFG, definition, and examples

Next: Ambiguity
● Ambiguity: Some string may have multiple
derivations in a CFG

● Ambiguity is a problem for compilers:

Compilers use derivation to give meaning to strings.

Example: meaning of 1+0x0 ∈ ∑* is its value, 1 ∈ ℕ

If there are two different derivations,

the value may not be well defined.


Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1

One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Another derivation:
S→?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1

One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Another derivation:
S → SxS → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1

One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Another derivation:
S → SxS → Sx0 → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1

One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Another derivation:
S → SxS → Sx0 → S+Sx0 → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1

One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Another derivation:
S → SxS → Sx0 → S+Sx0 → S+0x0 → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1

One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Another derivation:
S → SxS → Sx0 → S+Sx0 → S+0x0 → 1+0x0
We now want to define CFG with no ambiguity

Definition: A derivation is leftmost if at every step


the leftmost variable is expanded

Example: the 1st previous derivation was leftmost


S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Definition: A CFG G is un-ambiguous if no string


has two different leftmost derivations.
Example
The CFG S → S+S | S x S | ( S ) | 0 | 1
is ambiguous because 1+0x0 has two distinct
leftmost derivations

One leftmost derivation:


S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0

Another leftmost derivation:


S → SxS → S+SxS → 1+SxS → 1+0xS → 1+0x0
Example Instead of using CFG
S → S+S | S x S | ( S ) | 0 | 1

we may use un-ambiguous grammar


S → S + T |T
T→TxF|F
F → 0 | 1 | (S)

Unique leftmost derivation of 1+0x0:


S→?
Example Instead of using CFG
S → S+S | S x S | ( S ) | 0 | 1

we may use un-ambiguous grammar


S → S + T |T
T→TxF|F
F → 0 | 1 | (S)

Unique leftmost derivation of 1+0x0:


S→S+T→?
Example Instead of using CFG
S → S+S | S x S | ( S ) | 0 | 1

we may use un-ambiguous grammar


S → S + T |T
T→TxF|F
F → 0 | 1 | (S)

Unique leftmost derivation of 1+0x0:


S → S + T → T + T →?
Example Instead of using CFG
S → S+S | S x S | ( S ) | 0 | 1

we may use un-ambiguous grammar


S → S + T |T
T→TxF|F
F → 0 | 1 | (S)

Unique leftmost derivation of 1+0x0:


S → S + T → T + T → F + T →?
Example Instead of using CFG
S → S+S | S x S | ( S ) | 0 | 1

we may use un-ambiguous grammar


S → S + T |T
T→TxF|F
F → 0 | 1 | (S)

Unique leftmost derivation of 1+0x0:


S→S+T→T+T→F+T→1+T
→?
Example Instead of using CFG
S → S+S | S x S | ( S ) | 0 | 1

we may use un-ambiguous grammar


S → S + T |T
T→TxF|F
F → 0 | 1 | (S)

Unique leftmost derivation of 1+0x0:


S→S+T→T+T→F+T→1+T
→1+TxF→?
Example Instead of using CFG
S → S+S | S x S | ( S ) | 0 | 1

we may use un-ambiguous grammar


S → S + T |T
T→TxF|F
F → 0 | 1 | (S)

Unique leftmost derivation of 1+0x0:


S→S+T→T+T→F+T→1+T
→ 1 + T x F → 1 + 0 x F → 1 + 0 x0

You might also like