Professional Documents
Culture Documents
Properties of Context-Free Languages: Reading: Chapter 7
Properties of Context-Free Languages: Reading: Chapter 7
Languages
Reading: Chapter 7
1
Topics
1) Simplifying CFGs, Normal forms
2) Pumping lemma for CFLs
3) Closure and decision properties of CFLs
2
Three ways to simplify/clean a CFG
3
Eliminating useless symbols
A symbol X is reachable if there exists:
– S 🡺* α X β
reachable generating
4
Algorithm to detect useless symbols
or can we switch? 5
Example: Useless symbols
• S🡺AB | a
• A🡺 b
1. A, S are generating
2. B is not generating (and therefore B is useless)
3. ==> Eliminating B… (i.e., remove all productions that involve B)
1. S🡺 a
2. A 🡺 b
4. Now, A is not reachable and therefore is useless
5. Simplified G:
1. S 🡺 a What would happen if you reverse the order:
i.e., test reachability before generating?
Will fail to remove:
A🡺b 6
X 🡺* w
Algorithm to find all generating symbols
• Given: G=(V,T,P,S)
• Basis:
– Every symbol in T is obviously generating.
• Induction:
– Suppose for a production A🡺 α, where α is
generating
– Then, A is also generating
7
S 🡺* α X β
Algorithm to find all reachable symbols
• Given: G=(V,T,P,S)
• Basis:
– S is obviously reachable (from itself)
• Induction:
– Suppose for a production A🡺 α1 α2… αk, where
A is reachable
– Then, all symbols on the right hand side, {α1, α2
,… αk} are also reachable.
8
What’s the point of removing ε-productions?
A🡺ε
Eliminating ε-productions
Caveat: It is not possible to eliminate ε-productions for
languages which include ε in their word set
So we will target the grammar for the rest of the language
10
Eliminating ε-productions
Given: G=(V,T,P,S)
Algorithm:
1. Detect all nullable variables in G
2. Then construct G1=(V,T,P1,S) as follows:
i. For each production of the form: A🡺X1X2…Xk, where k≥1,
suppose m out of the k Xi’s are nullable symbols
ii. Then G1 will have 2m versions for this production
i. i.e, all combinations where each Xi is either present or absent
iii. Alternatively, if a production is of the form: A🡺ε, then
remove it
11
Example: Eliminating ε-productions
• Let L be the language represented by the following CFG G:
i. S🡺AB
ii. A🡺aAA | ε
iii. B🡺bBB | ε Simplified
grammar
Goal: To construct G1, which is the grammar for L-{ε}
12
A🡺B
B has to be a variable
before after
13
A🡺B
Eliminating unit productions
• Unit production is one which is of the form A🡺 B, where
both A & B are variables
• E.g.,
1. E 🡺 T | E+T
2. T 🡺 F | T*F
3. F 🡺 I | (E)
4. I 🡺 a | b | Ia | Ib | I0 | I1
– How to eliminate unit productions?
– Induction: If (A,B) and (B,C) are unit pairs, and A🡺C is also a unit pair.
15
The Unit Pair Algorithm:
to remove unit productions
Input: G=(V,T,P,S)
Goal: to build G1=(V,T,P1,S) devoid of unit
productions
Algorithm:
1. Find all unit pairs in G
2. For each unit pair (A,B) in G:
1. Add to P1 a new production A🡺α, for every B🡺α
which is a non-unit production
2. If a resulting production is already there in P, then
there is no need to add it.
16
Example: eliminating unit
productions
Unit pairs Only non-unit
productions to be
added to P1
G:
1. E 🡺 T | E+T (E,E) E 🡺 E+T
2. T 🡺 F | T*F
(E,T) E 🡺 T*F
3. F 🡺 I | (E)
4. I 🡺 a | b | Ia | Ib | I0 | I1 (E,F) E 🡺 (E)
(E,I) E 🡺 a|b|Ia | Ib | I0 | I1
(T,T) T 🡺 T*F
(T,F) T 🡺 (E)
(T,I) T 🡺 a|b| Ia | Ib | I0 | I1
G1:
1. E 🡺 E+T | T*F | (E) | a| b | Ia | Ib | I0 | I1 (F,F) F 🡺 (E)
2. T 🡺 T*F | (E) | a| b | Ia | Ib | I0 | I1 (F,I) F 🡺 a| b| Ia | Ib | I0 | I1
3. F 🡺 (E) | a| b | Ia | Ib | I0 | I1
4. I 🡺 a | b | Ia | Ib | I0 | I1
(I,I) I 🡺 a| b | Ia | Ib | I0 | I1
17
Putting all this together…
• Theorem: If G is a CFG for a language that contains at
least one string other than ε, then there is another CFG
G1, such that L(G1)=L(G) - ε, and G1 has:
– no ε -productions
– no unit productions
– no useless symbols
• Algorithm:
Step 1) eliminate ε -productions
Step 2) eliminate unit productions Again,
Step 3) eliminate useless symbols the order is
important!
Why?
18
Normal Forms
19
Why normal forms?
• If all productions of the grammar could be
expressed in the same form(s), then:
20
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{ε}
Definition:
G is said to be in Chomsky Normal Form if all its
productions are in one of the following two
forms:
i. A 🡺 BC where A,B,C are variables, or
ii. A🡺a where a is a terminal
– G has no useless symbols
– G has no unit productions
– G has no ε-productions
21
How to convert a G into CNF?
• Assumption: G has no ε-productions, unit productions or useless
symbols
• Satisfy A 🡺 a:
For every terminal a :
i. create a unique variable, say Xa, with a production Xa 🡺 a
ii. replace all other instances of a in G by Xa
A🡺B1C1 C1🡺B2C2 … 22
Ck-3🡺Bk-2Ck-2 Ck-2🡺Bk-1Bk
Example #1
G in CNF:
G:
S => AS | BABC X0 => 0
X1 => 1
A => A1 | 0A1 | 01
S => AS | BY1
B => 0B | 0
Y1 => AY2
C => 1C | 1 Y2 => BC
A => AX1 | X0Y3 | X0X1
Y3 => AX1
B => X0B | 0
C => X1C | 1
23
1. E 🡺 EC1 | TC2 | X(C3 | IXa | IXb |
Example #2 2.
IX0 | IX1 | a | b
C1 🡺 X+T
3. C2 🡺 X*F
4. C3 🡺 EX)
G: 5. T 🡺 TC2 | X(C3 | IXa | IXb | IX0 |
1. E 🡺 E+T | T*F | (E) | Ia | Ib | I0 | I1 IX1 IX1 | a | b
2. T 🡺 T*F | (E) | Ia | Ib | I0 | I1 6. F 🡺 X(C3 | IXa | IXb | IX0 | IX1 IX1
3. F 🡺 (E) | Ia | Ib | I0 | I1
4. I 🡺 a | b | Ia | Ib | I0 | I1 ) |a|b
(2 7. I 🡺 IXa | IXb | IX0 | IX1 IX1 | a | b
t ep
Step (1) S 8. X+ 🡺 +
9. X* 🡺 *
1. E 🡺 EX+T | TX*F | X(EX) | IXa | IXb | IX0 | IX1 | a | b
10. X( 🡺 (
2. T 🡺 TX*F | X(EX) | IXa | IXb | IX0 | IX1 | a | b
11. X) 🡺 )
3. F 🡺 X(EX) | IXa | IXb | IX0 | IX1 | a | b
12. X0 🡺 0
4. I 🡺 Xa | Xb | IXa | IXb | IX0 | IX1 | a | b
13. X1 🡺 1
5. X+ 🡺 +
14. Xa 🡺 a
6. X* 🡺 *
15. Xb 🡺 b
7. X( 🡺 (
8. X) 🡺 )
9. X0 🡺 0
10. X1 🡺 1
11. Xa 🡺 a 24
12. Xb 🡺 b
Languages with ε
• For languages that include ε,
– Write down the rest of grammar in CNF
– Then add production “S => ε” at the end
E.g., consider: G in CNF:
G: X0 => 0
S => AS | BABC X1 => 1
A => A1 | 0A1 | 01 | ε S => AS | BY1 |
B => 0B | 0 | ε Y1 => AY2 ε
C => 1C | 1 | ε Y2 => BC
26
Pumping lemma
• A result that will be useful in proving
languages that are not CFLs
– (just like we did for regular languages)
27
The “parse tree theorem”
• Any parse tree generated by
a CNF will be a binary tree
• All internal nodes have
S
exactly two children (except A1
those nodes connected to the
A2
leaves). =
w 28
Proof: Size of parse trees |w| ≤ 2h-1
Proof: (using induction on h) Parse tree for
Basis: h = 1 w
🡺 Derivation will have to be “S🡺a”
🡺 |w|= 1 = 21-1 . S
A B
Ind. Hyp: h = k-1 🡺 |w|≤ 2k-2 =
h
A = height
Ind. Step: h = k 0
S will have exactly two children: S🡺AB
Implication:
– If |w| ≥ 2h, then
• Its parse tree’s height is at least h+1
30
The Pumping Lemma for CFLs
Let L be a CFL.
Then there exists a constant n, s.t.,
– if z ∈L s.t. |z|≥n, then we can write z=uvwxy, such
that:
1. |vwx| ≤ n
2. vx≠ε
3. For all i≥0: uviwxkiy ∈ L
Note:
1. we are pumping in two places (v & x)
2. u or y could be of any length, may be ε
3. vwx is in the middle, of size >0
31
Proof
Since |z| ≥ n = 2k, where k = |V|, it follows from the
corollary that any derivation tree for z has height
at least k+1.
33
• Generic Description:
S
u v w x y
• Example:
S
E F
C D A
F
c d G G
f
F A
g
34
In this case u = cd and y =f f a
• Cut out the subtree rooted at A:
S
u
y S =>* uAy (1)
• Example:
S
E F
C D A
F
c d
f S =>* cdAf
35
• Consider the subtree rooted at A:
A
A
G G
A
F A g
v x
f a
• Cut out the subtree rooted at the first occurrence of A:
A
A
G G
A
36
F A g
• Consider the smallest subtree rooted at A:
A
Lowest appearance in parse
A tree 🡺
w
A =>* w (3)
A =>* a
37
• In addition, (2) also tells us:
• More generally:
•
v x
S =>* uviwxiy for all i ≥1,
• And also:
v w x
39
More Examples
• L = { ww | w is in {0,1}*}
• L = {aibjck | i<j<k }
40
CFL Closure Properties
41
Closure Property Results
• CFLs are closed under:
– Union
– Concatenation
– Kleene closure operator
– Substitution
– Homomorphism, inverse homomorphism
– reversal
• CFLs are not closed under: Note: Reg languages
– Intersection are closed
– Difference under
these
– Complementation operators42
Strategy for Closure Property
Proofs
• First prove “closure under substitution”
• Using the above result, prove other closure properties
• CFLs are closed under:
– Union
– Concatenation
– Kleene closure operator
Prove – Substitution
this first – Homomorphism, inverse homomorphism
– Reversal
43
The Substitution operation
For each a ∈ ∑, then let s(a) be a language
If w=a1a2…an ∈ L, then:
• s(w) = { x1x2 … } ∈ s(L), s.t., xi ∈ s(ai)
Example: S
– Let ∑={0,1}
– Let: s(0) = {anbn | n ≥1}, San
Sa1 Sa2
– s(1) = {aa,bb}
…
– If w=01, s(w)=s(0).s(1)
x1 x2 xn
• E.g., s(w) contains a1 b1 aa, a1 b1bb,
a2 b2 aa, a2 b2bb,
… and so on.
44
Substitution of a CFL: example
– Let L = language of binary palindromes s.t., substitutions for 0 and 1
are defined as follows:
• s(0) = {anbn | n ≥1}, s(1) = {xx,yy}
– Prove that s(L) is also a CFL.
S=> S0SS0 | S1 S S1 |ε
S0=> aS0b | ab 45
S1=> xx | yy
CFLs are closed under union
Let L1 and L2 be CFLs
To show: L2 U L2 is also a CFL
using Substitution:
• Alternative proof
– Let S1 and S2 be the starting variables of the grammars
for L1 and L2
• Then, Snew => S1 | S2 46
• G1 = { S1 => aA
A=> a }
• G2 = { S2 => bB
B=> b }
S => S1 | S2
S1 => aA
A=> a
S2 => bB
B=> b }
47
CFLs are closed under
concatenation
• Let L1 and L2 be CFLs
using Substitution:
• Alternative proof
– Let S1 and S2 be the starting variables of the grammars
for L1 and L2
• Then, Snew => S1S2 48
• G1 = { S1 => aA
A=> a }
• G2 = { S2 => bB
B=> b }
S => S1.S2
S1 => aA S => S1.S2
aA.S2
A=> a
a2.S2
S2 => bB a2.bB
B=> b } a2b2
49
CFLs are closed under Kleene
Closure
• Let L be a CFL
– Let Lnew = {a}* and s(a) = L1
– Then, L* = s(Lnew)
50
CFLs are closed under Reversal
51
w = uXv in G, and G has a production X→x, w⇒uxv
wR = vRXuR in G′
52
Closure of CFL’s Under
Homomorphism
• Construct a grammar for h(L) by replacing
each terminal symbol a by h(a).
• G has productions S -> 0S1 | 01.
• h is defined by h(0) = ab, h(1) = ε.
• h(L(G)) has the grammar with productions S ->
abS | ab.
53
Closure of CFL’s Under Inverse
Homomorphism
54
CFLs are not closed under
Intersection
• Existential proof:
– L1 = {0n1n2i | n≥1,i≥1}
– L2 = {0i1n2n | n≥1,i≥1}
• Both L1 and L2 are CFLs
– Grammars?
• But L1 ∩ L2 cannot be a CFL
– Why?
55
• We have an example, where intersection is
not closed.
For, 0n1n2n we have proved through pumping
lemma it does not belong to CFL
56
CFLs are not closed under
complementation
• Follows from the fact that CFLs are not
closed under intersection
• L 1 ∩ L 2 = L 1 U L2
58
Decision Properties
• Emptiness test
– Generating test
– Reachability test
• Membership test
– PDA acceptance
– CYK algorithm
59
“Undecidable” problems for CFL
• Is a given CFG G ambiguous?
• Is a given CFL inherently ambiguous?
• Is the intersection of two CFLs empty?
• Are two CFLs the same?
• Is a given L(G) equal to ∑*?
60
Summary
• Normal Forms
– Chomsky Normal Form
– Griebach Normal Form
– Useful in proroving P/L
• Pumping Lemma for CFLs
– Main difference: z=uviwxiy
• Closure properties
– Closed under: union, concatentation, reversal, Kleen closure,
homomorphism, substitution
– Not closed under: intersection, complementation, difference
61