Handout-13 Context Free Grammars

Context Free Grammars

Total language Tree

• A tree with Start symbol at its root and whose nodes
are working strings of terminals and nonterminal

• The descendant of each node are all the possible

results of applying every applicable production to
the working string one at a time. A string of all
terminals is a terminal node in the tree

• S → aa | bX |aXX
• X → ab | b

aa bX aXX

bab bb aabX abX aXab aXb

aabab aabb abab abb aabab abab aabb aabb

Language Span of CFGs

• All possible languages can be generated by CFGs

• All regular languages and some of the non-regular

languages can be generated by CFGs

• Some regular (not all) and some non-regular

languages can be generated by the CFGs

• Which statement is true?

Regular Languages and
• A semiword is a string of terminals(may be none)
concatenated with exactly one nonterminal on the

• It is of the form (terminal)(terminal)…

Regular Languages and CFGs
• All regular languages are also Context Free
• Therefore CFGs can be written for all RLs
• Theorem
– Given any FA, there is a CFG that generates
exactly the same language accepted by the FA.
– All regular languages are Context Free
– We will prove this using the Constructive Proof of
the Theorem i.e.
• Reduction of an FA into a CFG describing the same
Regular languages and CFGs
• Conversion Algorithm
– The non terminals in the CFG will be all the names of
the states in the FA with the start state renamed S.
– For every edge at a state X leading to State Y
• Create the production X→aY and do the same for b edges
• For loops add the production X → aX

– For every final state X, create the production X→Λ


x y x
Regular Languages and CFG
• The CFG generated through this procedure
generates the same language as accepted
by the FA
• Proof
– (i) Every word accepted by FA can be generated
by CFG
– (ii) Every word generated by CFG is accepted by
Regular Languages and CFG
• Example
a a,b
S- M F+

S → aM
S → bS Derivation of babbaaba
M →aF through CFG and
traversal through FA
M →bS
F →aF
F →bF
Regular Languages and CFG

• FA to CFG
– Words that contain a double aa
– All words having different first and last
Regular Languages and CFG

• Can a CFG be converted back to an FA, RE or a TG.

• Need a constructive algorithm if possible

• Would this algorithm be applicable to all CFGs

• What about CFGs defining non RLs: Failure !!!! FAs

cant be built for non RLs

• Solution
– Differentiate CFGs defining RLs and those defining non RLs
Regular Languages and CFGs

• Theorem
– If all the productions in a given CFG fit one
of the two forms
• Nonterminal → semiword
• Nonterminal → word
– Where word can be null, the language
generated by this CFG is regular
Regular languages and CFGs
• Proof
– Consider a general CFG of this form
• N1 → w1 N2
• N2 → w 2 N3
• N3 → w 3 N4
• N4 →w5 (Can have many more productions)

– Ns are non-terminals while ws are terminals. Together they

form a familiar pattern: semiword
– Draw and label circles for all Ns and one extra circle labeled
with a +. Mark the S circle with -.
– For every production of the form Nx → wyNz draw a directed
edge from state Nx to Nz labelled with the word w
– If Nx = Nz then the path is a loop
– For every production of the form Np → wq draw a directed
edge from Np to + and label it with the word wq, even if wq is
Regular Languages and CFGs

• The resultant figure is a transition graph

• Each path in this TG from – to + corresponds

to a word generated by the CFG

• Conversely derivation of a word from this CFG

corresponds to a path in the TG from – to +.

• The language of this CFG is regular

Regular Grammars
• Regular Grammars
– A CFG is called a regular grammar if each of
its productions is of one of the two forms
• Nonterminals → semiword
• Nonterminals → word

• Example
– S → aA | bB
– A → aS | a
– B → bS | b
Λ Productions
• Productions of the form
– N→Λ
– are called null (Λ) productions
• All grammars that generate the Λ string include at
least one null production
• Some grammars that do not generate Λ string still
might contain null productions
– S → aX
– X→Λ
Λ Productions

• Hazards of Λ Productions
– Create ambiguity in word derivation
• Solution
– Kill Them !!!
Killing Null Productions
• Theorem
– If L is a context free language generated by CFG
that includes Λ-productions then there is a
different CFG that has no Λ- productions that
generates exactly the same language L with the
exception of only Λ.
Killing Λ Productions

• Constructive Algorithm
– Identify Null Productions
– Remove each of them one by one
– For each NT having a null production, add
productions where the NT has been replaced
by null
• Example
– S  aSa | bSb |Λ becomes
– S  aSa | bSb |aa |bb
Killing Λ Productions
• Problem Identified !!!
– S  a | Xb | aYa
Killing Λ Productions

• Null able Non-terminal

– In CFG a nonterminal N is called nullable if
• There is a production N → Λ, or
• There is a derivation that starts at N and leads
to Λ (N  ….  Λ)
Killing Λ Productions
• Problem Solved !!!
• Modified Replacement Rule
– Delete all Λ-productions
– Add the following productions: For every
production X → old string add new productions of
the form X → .. Where the right side will account
for any modification of the old string that can be
formed by deleting all possible subsets of
nullable nonterminals while avoiding introduction
of a null production in this process
Killing Null Productions
• Not So Fast !!!!!!!!!!
– S → Xay | YY | aX | ZYX
– X → Za | bZ | ZZ | Yb
– Y → Ya| XY | Λ
– Z → aX | YYY
– How could one identify a nullable NT in
such a complex grammar
• Solution
– A bucket of Blue Paint
Consider the CFG
S  a | Xb | aYa
Old nullable New So the new CFG is
Production Production
XY nothing
S  a | Xb | aa | aYa |b
XΛ nothing
YX nothing XY
S  Xb Sb Yb|X
S  aYa S  aa

Consider the CFG
S  Xa
X  aX | bX | Λ

Old nullable New So the new CFG is

Production roduction
S  Xa Sa S  a | Xa
X  aX | bX | a | b
X  aX Xa

X  bX Xb


S  XY
X  Zb
• Null-able Non-terminals are?
Y  bW
Z  AB • A, B, Z and W
A  aA | bA | Λ
B  Ba | Bb | Λ

S  XY
X  Zb
Y  bW
Z  AB Example Contd.
A  aA | bA | Λ
B  Ba | Bb | Λ

Old nullable New So the new CFG is

Production Production
S  XY
X  Zb Xb
Y  bW Yb X  Zb | b
Z  AB Z  A and Z  B Y  bW | b
W  Z Nothing new Z  AB | A | B
A  aAA  a
A  bAA  b
B  BaB a A  aA | bA | a | b
B  BbB  b B  Ba | Ba | a | b
Unit Productions
• A production of the form
– Nonterminal → one Nonterminal
– Is called a unit production

• Unit productions are some times required to

change the form of a working string
– (Arbitrary)A(arbitrary)
– (Arbitrary)B(Arbitrary)

• Unit Production are also problematic and

thus need to be exterminated
Killing Unit Productions
• Theorem
– If there is a CFG for the language that has no Λ-
productions, then there is also a CFG for L with no
Λ-productions and no unit productions
Killing Unit Productions
• Naïve Elimination Rule
– Eliminate unit productions one by one and replace them
with new productions without changing the language
being generated by the CFG
– Infinite loop and no benefit
• Example
– S → A |bb
– A→B|b
– B→S|a

• Modified Elimination Rule

– Eliminate all unit productions simultaneously
– Look for any sequence of productions that lead to a
replacement with a unit production. Replace all such
derived unit productions with the final replacement.
Killing Unit Productions
• Example
– S → A | bb
– A→B|b
– B→S|a
• Unit Productions
– S→A
– A→B
– B→S
• Derived Unit Production
– S→A→B
– A→B→S
– B→S→A
Killing Unit Productions
• New CFG
– S → bb|b|a
– A → b|a|bb
– B → a|bb|b
New Format for CFG
• Theorem
– If L is a language generated by some CFG,
then there is another CFG that generated
all the non-Λ words of L, all of whose
productions are of one of the two basic
• Nonterminal → string of only Nonterminals
• Nonterminal → one terminal
New Format for CFG
• Proof
– Suppose a CFG contains non terminals S, X1, X2,X3 …
and two terminals a and b
– Add two new nonterminals A and B and two productions
• A→a
• B→b
– For every previous production involving terminals,
replace each a with the nonterminal a and b with the
nonterminal B
– Any production which is already in the desired form
should be left untouched to avoid introduction of unit
– All the productions now are of the form
• Nonterminal → strings of only nonterminals
• Nonterminal → one terminal
New format for CFG
• Example
– S → X1 | X2aX2 | aSb | b
– X1 → X2X2 | b
– X2 → aX2 | aaX1
Chomsky Normal Form: The
Ultimate Target !
• If a CFG has only productions of the form
– Nonterminals → strings of exactly two
– Nonterminals → one terminal
• It is said to be in Chomsky Normal Form, or
• Theorem
– For any context Free language L, the non Λ words
of the language can be generated by a CFG in CNF
• Proof
– Any CFG can be converted to the following format
• Nonterminal → strings of Nonterminals or
• Nonterminal → one terminal
– For this new CFG modify the productions so that
they become in the CNF
– This conversion requires addition of new
• S → X1X2X3X4 will be converted to
– S → X1R1
– R1 → X2R2
– R2 → X3X4
• Example
– S → aSa | bSb | a | b | aa | bb
– S → AR1
– R1 → SA
– S → BR3
– S → AA
– S → BB
– S→b
– S→a
– A→a
– B→b

