Handout-13 Context Free Grammars

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 38

Context Free Grammars

Total language Tree


• A tree with Start symbol at its root and whose nodes
are working strings of terminals and nonterminal

• The descendant of each node are all the possible


results of applying every applicable production to
the working string one at a time. A string of all
terminals is a terminal node in the tree

• Total Language Tree


Total Language Tree
• S → aa | bX |aXX
• X → ab | b
S

aa bX aXX

bab bb aabX abX aXab aXb

aabab aabb abab abb aabab abab aabb aabb


Language Span of CFGs

• All possible languages can be generated by CFGs

• All regular languages and some of the non-regular

languages can be generated by CFGs

• Some regular (not all) and some non-regular

languages can be generated by the CFGs

• Which statement is true?


Regular Languages and
CFG
• A semiword is a string of terminals(may be none)
concatenated with exactly one nonterminal on the
right.

• It is of the form (terminal)(terminal)…


(terminal)Nonterminal
Regular Languages and CFGs
• All regular languages are also Context Free
• Therefore CFGs can be written for all RLs
• Theorem
– Given any FA, there is a CFG that generates
exactly the same language accepted by the FA.
– All regular languages are Context Free
– We will prove this using the Constructive Proof of
the Theorem i.e.
• Reduction of an FA into a CFG describing the same
languages
Regular languages and CFGs
• Conversion Algorithm
– The non terminals in the CFG will be all the names of
the states in the FA with the start state renamed S.
– For every edge at a state X leading to State Y
• Create the production X→aY and do the same for b edges
• For loops add the production X → aX

– For every final state X, create the production X→Λ


a

a
x y x
Regular Languages and CFG
• The CFG generated through this procedure
generates the same language as accepted
by the FA
• Proof
– (i) Every word accepted by FA can be generated
by CFG
– (ii) Every word generated by CFG is accepted by
FA
Regular Languages and CFG
• Example
a a,b
b
a
S- M F+

b
S → aM
S → bS Derivation of babbaaba
M →aF through CFG and
traversal through FA
M →bS
F →aF
F →bF
F→Λ
Regular Languages and CFG

• FA to CFG
– Words that contain a double aa
– All words having different first and last
letters
Regular Languages and CFG

• Can a CFG be converted back to an FA, RE or a TG.

• Need a constructive algorithm if possible

• Would this algorithm be applicable to all CFGs

• What about CFGs defining non RLs: Failure !!!! FAs


cant be built for non RLs

• Solution
– Differentiate CFGs defining RLs and those defining non RLs
Regular Languages and CFGs

• Theorem
– If all the productions in a given CFG fit one
of the two forms
• Nonterminal → semiword
• Nonterminal → word
– Where word can be null, the language
generated by this CFG is regular
Regular languages and CFGs
• Proof
– Consider a general CFG of this form
• N1 → w1 N2
• N2 → w 2 N3
• N3 → w 3 N4
• N4 →w5 (Can have many more productions)

– Ns are non-terminals while ws are terminals. Together they


form a familiar pattern: semiword
– Draw and label circles for all Ns and one extra circle labeled
with a +. Mark the S circle with -.
– For every production of the form Nx → wyNz draw a directed
edge from state Nx to Nz labelled with the word w
– If Nx = Nz then the path is a loop
– For every production of the form Np → wq draw a directed
edge from Np to + and label it with the word wq, even if wq is
Null
Regular Languages and CFGs

• The resultant figure is a transition graph

• Each path in this TG from – to + corresponds


to a word generated by the CFG

• Conversely derivation of a word from this CFG


corresponds to a path in the TG from – to +.

• The language of this CFG is regular


Regular Grammars
• Regular Grammars
– A CFG is called a regular grammar if each of
its productions is of one of the two forms
• Nonterminals → semiword
• Nonterminals → word

• Example
– S → aA | bB
– A → aS | a
– B → bS | b
Λ Productions
• Productions of the form
– N→Λ
– are called null (Λ) productions
• All grammars that generate the Λ string include at
least one null production
• Some grammars that do not generate Λ string still
might contain null productions
– S → aX
– X→Λ
Λ Productions

• Hazards of Λ Productions
– Create ambiguity in word derivation
• Solution
– Kill Them !!!
Killing Null Productions
• Theorem
– If L is a context free language generated by CFG
that includes Λ-productions then there is a
different CFG that has no Λ- productions that
generates exactly the same language L with the
exception of only Λ.
Killing Λ Productions

• Constructive Algorithm
– Identify Null Productions
– Remove each of them one by one
– For each NT having a null production, add
productions where the NT has been replaced
by null
• Example
– S  aSa | bSb |Λ becomes
– S  aSa | bSb |aa |bb
Killing Λ Productions
• Problem Identified !!!
– S  a | Xb | aYa
–XY|Λ
–Yb|X
Killing Λ Productions

• Null able Non-terminal


– In CFG a nonterminal N is called nullable if
• There is a production N → Λ, or
• There is a derivation that starts at N and leads
to Λ (N  ….  Λ)
Killing Λ Productions
• Problem Solved !!!
• Modified Replacement Rule
– Delete all Λ-productions
– Add the following productions: For every
production X → old string add new productions of
the form X → .. Where the right side will account
for any modification of the old string that can be
formed by deleting all possible subsets of
nullable nonterminals while avoiding introduction
of a null production in this process
Killing Null Productions
• Not So Fast !!!!!!!!!!
– S → Xay | YY | aX | ZYX
– X → Za | bZ | ZZ | Yb
– Y → Ya| XY | Λ
– Z → aX | YYY
– How could one identify a nullable NT in
such a complex grammar
• Solution
– A bucket of Blue Paint
Example
Consider the CFG
S  a | Xb | aYa
XY|Λ
Yb|X
Old nullable New So the new CFG is
Production Production
XY nothing
S  a | Xb | aa | aYa |b
XΛ nothing
YX nothing XY
S  Xb Sb Yb|X
S  aYa S  aa

24
Example
Consider the CFG
S  Xa
X  aX | bX | Λ

Old nullable New So the new CFG is


Production roduction
S  Xa Sa S  a | Xa
X  aX | bX | a | b
X  aX Xa

X  bX Xb

25
Example

S  XY
X  Zb
• Null-able Non-terminals are?
Y  bW
Z  AB • A, B, Z and W
WZ
A  aA | bA | Λ
B  Ba | Bb | Λ

26
S  XY
X  Zb
Y  bW
Z  AB Example Contd.
WZ
A  aA | bA | Λ
B  Ba | Bb | Λ

Old nullable New So the new CFG is


Production Production
S  XY
X  Zb Xb
Y  bW Yb X  Zb | b
Z  AB Z  A and Z  B Y  bW | b
W  Z Nothing new Z  AB | A | B
A  aAA  a
A  bAA  b
WZ
B  BaB a A  aA | bA | a | b
B  BbB  b B  Ba | Ba | a | b
27
Unit Productions
• A production of the form
– Nonterminal → one Nonterminal
– Is called a unit production

• Unit productions are some times required to


change the form of a working string
– (Arbitrary)A(arbitrary)
– (Arbitrary)B(Arbitrary)

• Unit Production are also problematic and


thus need to be exterminated
Killing Unit Productions
• Theorem
– If there is a CFG for the language that has no Λ-
productions, then there is also a CFG for L with no
Λ-productions and no unit productions
Killing Unit Productions
• Naïve Elimination Rule
– Eliminate unit productions one by one and replace them
with new productions without changing the language
being generated by the CFG
– Infinite loop and no benefit
• Example
– S → A |bb
– A→B|b
– B→S|a

• Modified Elimination Rule


– Eliminate all unit productions simultaneously
– Look for any sequence of productions that lead to a
replacement with a unit production. Replace all such
derived unit productions with the final replacement.
Killing Unit Productions
• Example
– S → A | bb
– A→B|b
– B→S|a
• Unit Productions
– S→A
– A→B
– B→S
• Derived Unit Production
– S→A→B
– A→B→S
– B→S→A
Killing Unit Productions
• New CFG
– S → bb|b|a
– A → b|a|bb
– B → a|bb|b
New Format for CFG
• Theorem
– If L is a language generated by some CFG,
then there is another CFG that generated
all the non-Λ words of L, all of whose
productions are of one of the two basic
forms
• Nonterminal → string of only Nonterminals
• Nonterminal → one terminal
New Format for CFG
• Proof
– Suppose a CFG contains non terminals S, X1, X2,X3 …
and two terminals a and b
– Add two new nonterminals A and B and two productions
• A→a
• B→b
– For every previous production involving terminals,
replace each a with the nonterminal a and b with the
nonterminal B
– Any production which is already in the desired form
should be left untouched to avoid introduction of unit
productions
– All the productions now are of the form
• Nonterminal → strings of only nonterminals
• Nonterminal → one terminal
New format for CFG
• Example
– S → X1 | X2aX2 | aSb | b
– X1 → X2X2 | b
– X2 → aX2 | aaX1
Chomsky Normal Form: The
Ultimate Target !
• If a CFG has only productions of the form
– Nonterminals → strings of exactly two
Nonterminals
– Nonterminals → one terminal
• It is said to be in Chomsky Normal Form, or
CNF
• Theorem
– For any context Free language L, the non Λ words
of the language can be generated by a CFG in CNF
format
CNF
• Proof
– Any CFG can be converted to the following format
• Nonterminal → strings of Nonterminals or
• Nonterminal → one terminal
– For this new CFG modify the productions so that
they become in the CNF
– This conversion requires addition of new
nonterminals
• S → X1X2X3X4 will be converted to
– S → X1R1
– R1 → X2R2
– R2 → X3X4
CNF
• Example
– S → aSa | bSb | a | b | aa | bb
• CNF
– S → AR1
– R1 → SA
– S → BR3
– S → AA
– S → BB
– S→b
– S→a
– A→a
– B→b

You might also like