Professional Documents
Culture Documents
CS340 Theory of Computation VI
CS340 Theory of Computation VI
CS340 Theory of Computation VI
Lecture 19
Some examples of conversion to Chomsky Normal form
Eg.: The set of Balanced Parentheses S Ñ rSs | SS | ϵ
S Ñ rSs | SS | rs (removing ϵ)
S Ñ ASB | SS | AB AÑa BÑb (creating non-terminals for required)
S Ñ AB | AC | AC | SS C Ñ SB A Ñ r B Ñs
(replacing S Ñ ASB to S Ñ AC and C Ñ SB)
Consider tan bn | n ě 0u is not a regular set, but is a CFL. What about tan bn cn | n ě 0u ñ It
is not a CFL.
Linear Grammar
A CFG G is right linear if all productions are of the following form:
A Ñ xB, A Ñ x for A, B P N, x P Σ˚
That is atmost one non-terminal appear on the RHS, and that non-terminal must be the right-
most symbol.
A Ñ Bx, A Ñ x for A, B P N, x P Σ˚
That is atmost one non-terminal appear on the RHS, and that non-terminal must be the left-
most symbol.
A linear grammar is a grammar in which at most one non-terminal can occur on the RHS
of any production irrespective of the position.
Note: A regular grammar is a linear grammar, however not all linear grammars are regular.
Theorem 1: let G be a right linear grammar then LpGq is regular.
Theorem 2: let A Ď Σ˚ be a regular set, then there exists a right linear grammar G, s.t.
A “ LpGq.
Stack
hkkfirst
ikkj
Eg.: ppp, a, Aq, pq, looB1on, ..., Bk qq P δ
mo
last
Lecture 20
Configurations
M “ pQ, Σ, Γ, δ, s, K, F q, a configuration of M is an element of Q ˆ Σ˚ ˆ Γ˚ , which denote the cur-
rent state of the machine, the part of the input that is unread, and the current content of the stack.
1
Define: 1-step next configuration relation: ÝÑ.
M
1
if ppp, a, Aq, pq, γqq P δ then for any y P Σ˚ and β P Γ˚ , we have pp, ay, Aβq ÝÑ pq, y, γβq
M
1
if ppp, ϵ, Aq, pq, γqq P δ then for any y P Σ˚ and β P Γ˚ , we have pp, y, Aβq ÝÑ pq, y, γβq
M
˚ 1
Let ÝÑ denote the reflexive-transitive closure of ÝÑ.
M M
Acceptance
Acceptance is of two types in NPDA: By final state and empty stack
˚
By final state: M accepts x by final state if ps, x, Kq ÝÑ pq, ϵ, γq for some q P F, γ P Γ˚
M
˚
By empty stack : M accepts x by empty stack if ps, x, Kq ÝÑ pq, ϵ, ϵq for some q P Q.
M
Concatenation
if A and B are CFLs, with LpG1 q “ A and LpG2 q “ B, with start symbols S1 , S2 respectively,
construct CFG G s.t. LpGq “ AB “ txy | x P A, y P Bu
Combine the grammars G1 , G2
Add a new start symbol with production S Ñ S1 S2
Kleene star
1 1
if A is a CFL, s.t. LpGq “ A, and start symbol is S1 , construct a G , s.t. LpG q “ A˚ as follows:
Take G along with a new start symbol S, and along with the production
S Ñ S1 S | ϵ
Intersection
CFLs are not closed under intersection.
tan bn cn | n ě 0u
Eg.: tam bm cn | m, n ě 0u tan bm cm | m, n ě 0u “ looooooooomooooooooon
Ş
not a CFL
Intersection of a CFL and a regular set
Theorem: CFLs are closedŞunder intersection with regular sets: if A Ď Σ˚ is a CFL and
B Ď Σ˚ is a regular set then A B is a CFL.
Brief proof idea: Consider a NPDA M1 , and a DFA M2 , s.t. LpM1 q “ A, and LpM2 q “ B
NPDA N : apply product construction on M1 , and M2 . That is states of N are product of
states M1 and M2 . Stack of N , simulates stack of M1 .
Lecture 21
Pumping Lemma for CFLs
If A Ď Σ˚ is a CFL, Dk ě 0 s.t. for every z P A, s.t. |z| ě k, can be split into five substrings,
z “ uvwxy, s.t. vx ‰ ϵ, |vwx| ď k and @i ě 0, uv i wxi y P A.
To prove the pumping lemma, let us introduce the notion of parse or derivation trees.
Parse tree is a tree satisfying the following:
5
a4 b4 derivation tree
Observation: Parse trees of Chomsky grammars for long strings must have long paths, because
the number of symbols can at most double when you go down a level. In order to have 2n symbols
at the bottom level, the tree must be of the depth atleast n, i.e. it must have at least n ` 1 levels.
Proof: Let G be the grammar for A in Chomsky normal form. Take k “ 2n`1 , where n is the
number of non-terminals in G. Let z P A, s.t. |z| ě k, then parse tree of z must be of depth
atleast n ` 1. That path of the therefore must contain atleast n ` 1 occurrences of non-terminals.
Take the first pair of occurences of the non-terminal that appeares twice bottom-up. For eg.:
Say X is non-terminal. Now break z into substrings uvwxy s.t., w is the string generated
by the lower occurrence of X and vwx is generated by the upper occurrence of X. Thus in this
example w “ ab and vwx “ aabb
Further let T be the subtree rooted at the upper occurrence of X, and t be the subtree rooted
at the lower occurrence of X.
the subtrees
By removing t and replacing it with T we get the parse tree for uv 2 wx2 y. This can be repeated,
and hence the parse tree for uv i wxi y, i ě 1 can be generated. We can also cut out T and replace
it with t to get the parse tree for uv 0 wx0 y “ uwy.
Eg.: Use the pumping lemma to show that A “ tan bn an | n ě 0u is not context-free.
Let for a given k, we have z “ ak bk ck , we have |z| “ 3k ě k. Let z “ uvwxy, s.t vx ‰ ϵ and
|vwx| ď k. Let us pick i “ 2.
We have if either v or x to contain atleast one a, and atleast one b, then uv 2 wx2 y is not of the
form a˚ b˚ a˚ , hence not in A.
If v and x contain only a’s then uv 2 wx2 y has twice as many a’s as b’s, hence not in A, similarly
v and x having only b, prevents uv 2 wx2 y to be in A.
Finally if one of v and x contains only a’s and other only b’s then uv 2 wx2 y cannot be of the
form am bm am , hence not in A. Thus by pumping lemma, A is not context free.
Eg.: Use the pumping lemma to show that A “ tww | w P ta, bu˚ u is not context-free.
We have the family of CFLs closed under intersection with regular sets. Hence it suffices to
1
show A “ A X Lta˚ b˚ a˚ b˚ u “ tan bm an bm | m, n ě 0u is not context free.
For a given k, let z “ ak bk ak bk . Let four substrings of the form ak or bk be blocks.
Let z “ uvwxy, s.t. vx ‰ ϵ, and |vwx| ď k. Let us pick i “ 2.
If one of v or x contains both a and b, then uv 2 wx2 y is not of the form a˚ b˚ a˚ b˚ , thus not in
1
A.
If v and x are both from the same block, then uv 2 wx2 y has one block longer than the other
three.
If v and x are in different blocks, then blocks must be adjacent, otherwise |vwx| would be
greater than k. This would again yield uv 2 wx2 y to be not of the form an bm an bm .
1
Hence A is not a CFL.
7
Lecture 22
We can use the contrapositive of the pumping lemma to prove whether a language is not context
free.
Eg.: A “ tw P ta, b, cu˚ | #a pwq “ #b pwq “ #c pwqu
We know that CFLs are closed under intersection with regular sets. Consider thus B “
A X Lpa˚ b˚ c˚ q “ tan bn cn | n ě 0u. We know that B is not CFL. Thus A is not a CFL.
Deterministic PDA
M “ pQ, Σ, Γ, δ, K, %, s, Fq (1)
δ Ď pQ ˆ pΣ Y t%u Y tϵuq ˆ Γq ˆ pQ ˆ Γ˚ q
- For any p P Q Y t%u, A P Γ, δ contains exactly one transition of the form
ppp, a, Aq, pq, Bqq OR ppp, ϵ, Aq, pq, Bqq
- K is always at the bottom of the stack. Thus all transitions involving K, would be of the
form: ppp, a, Kq, pq, βKqq
˚
- Acceptance is by final state: ps, x %, Kq Ý
Ñ pq, ϵ, Bq
G
Properties of DCFL: DCFLs are closed under complementation, but are not closed under union,
intersection. DFCLs ⊊ CFLs.
CFG ” PDA
Consider the following theorems, which show the equivalence between an NPDA and a CFG. CFGs
and NPDAs have equal expressive power.
Theorem 1: given a CFG G, we can construct a NPDA M , s.t. LpM q. “ LpGq
Theorem 2: given a NPDA M , we can construct a CFG G, s.t. LpGq “ LpM q
CFG G Ñ
Ý M
Suppose we have a CFG G “ pN, Σ, P, Sq, we can assume without loss of generality that all
productions in P are of the form A Ñ cB1 B2 ...Bk , where c P Σ Y tϵu, k ě 0.
Greibach Normal form: another normal form like the Chomsky normal form. All productions
in P are of the form A Ñ cB1 B2 ...Bk , for c P Σ Y tϵu, k ě 0.
The Greibach normal form, does not reduce expressive power. It is just a conventional notation.
Consider the following NPDA M “ ptqu, Σ, N, δ, q, S, ϕq where q is the only state in the NPDA,
Σ is the set of terminal of G, which is the input alphabet of M , N is the set of non-terminals of G,
and is the stack alphabet of M . S is the initial symbol of G, and the bottom of the initial stack
in M . The set of final states is ϕ, which is irrelevant as the NPDA accepts by empty stack.
For each production A Ñ cB1 B2 ...Bk , in P , δ, contains the transition ppq, c, Aq, pq, B1 B2 ...Bk qq.
We shall shop that indeed LpM q “ LpGq. Let’s first consider an example.
8
Leftmost derivation: derivations in which the productions are always applied to the leftmost
non-terminal.
The leftmost derivation in G of the terminal string corresponds exactly to the accepting com-
binations in M . The sequence of sentential forms in the leftmost derivation corresponds to the
sequence of configurations of M in the computation.
Let x “ r r r s s r s s
In the sentential form the terminal string is generated from left to right, one terminal at a
time, just like the input string x is read from left to right in one symbol at a time. Thus the two
strings of terminals occurring in a row concatenate to give x.
n
Lemma: for any z, y P Σ˚ , γ P N ˚ and A P N , A Ý
Ñ zγ by a leftmost derivation iff
G
n
pq, zy, Aq ÝÑ pq, y, γq.
M
Proof : By induction on n.
Base case: n “ 0, we have
0
AÝ
Ñ zγ ðñ A “ zγ
G
ðñ z “ ϵ and A “ γ
ðñ pq, zy, Aq “ pq, y, γq
0
ðñ pq, zy, Aq Ý
Ñ pq, y, γq
G
n`1
First let A ÝÝÑ zγ by a leftmost derivation. Then B Ñ cβ be the last production applied,
G
where c P Σ Y tϵu and β P N ˚ .
n 1
AÝ
Ñ uBα Ý
Ñ ucβα
G G
By definition of M ,
ppq, c, Bq, pq, βqq P δ
Thus
1
pq, cy, Bαq ÝÑ pq, y, βαq
M
Conversely, suppose:
n`1
pq, zy, Aq ÝÝÑ pq, y, γq
M
and let ppq, c, Bq, pq, βqq be the last transition taken, then z “ uc for some u P Σ˚ , γ “ βα for
some α P Γ˚ . Thus we have:
n 1
pq, ucy, Aq ÝÑ pq, cy, Bαq ÝÑ pq, y, βαq
M M
By induction hypothesis
n
AÝ
Ñ uBα
G
NPDA Ñ
Ý CFG
We will have to establish 2 things:
Step 1 : Every PDA, can be simulated by a PDA with only one state.
Step 2 : Every PDA of one state, has equivalent CFG (invert construction in CFG Ñ NPDA)
10
Lecture 23
Set membership question
Question: Given a set A Ď Σ˚ , and x P Σ˚ . Is x P A?
Depends on the way A is presented. If A is given as a language of a DFA, then we have a linear
time algorithm Opkq, where k is the size of the input string. However, if we have A represented as
language of an NFA. The naive solution would be to convert the NFA into a DFA, and run. This
gives exponential bound on the time complexity.
A better algorithm would to be maintain the ‘active’ set. Start with the set of start states.
For each state in the set, make transition over the input symbol, and create and update the new
set of possible states. This gives a bound of Opn2 kq.
0
1
2
3
4
5
6
Iteration 0
Where each entry Tij store the list of non-terminals that generate xij . Thus we are interested,
whether or not S is one of the non-terminals in T0,6 of not.
We start with length 1. Thus substrings of the form xi,i`1 , and corresponds to the table entry
in the top diagonal. For each substring c “ xi,i`1 , if there is a production X Ñ c P G. Thus,
Now we proceed to strings of length 2. For each substring xi,i`2 , we break it into two non-null
string substrings xi,i`1 , xi`1,i`2 , of length 1, and check the table entries Ti,i`1 , Ti`1,i`2 . We take a
non-terminal from each of this position, (say X from Ti,i`1 , and Y from Ti`1,i`2 ), and see if there
exists a production Z Ñ XY . For example, X0,2 “ aa, we have A P T0,1 , and A P T1,2 , so we look
11
0
A 1
A 2
B 3
B 4
A 5
B 6
Iteration 1
for a production with AA on the right hand side, and since there are any, we have T0,2 as empty
set ϕ.
0
A 1
ϕ A 2
S B 3
ϕ B 4
S A 5
S B 6
Iteration 2
Now for strings of length 3, we have x0,3 “ x0,1 x1,3 “ x0,2 x2,3 . We need to check both the
possibilities. First we find A P T0,1 , and S P T1,3 , so we look for productions with AS on the right,
and since there is nothing, we check for T0,2 , T2,3 . Since, T0,2 is ϕ, we ultimately find that T0,3 is
ϕ. We continue this way to complete the table:
0
A 1
ϕ A 2
ϕ S B 3
S C ϕ B 4
D S ϕ S A 5
S C ϕ C S B 6
Last iteration
CKY Algorithm
for i “ 0 to n ´ 1 do
Ti,i`1 “ H
for each production A Ñ a in G do
if a “ xi,i`1 then
Ti,i`1 “ Ti,i`1 Y tAu
end if
end for
end for
for m “ 2 to n do
for i “ 0 to n ´ m do
Ti,i`m “ H
for j “ i ` 1 to i ` m ´ 1 do
for each production A Ñ BC in P of G do
if B P Ti,j and C P Tj,i`m then
Ti,i`m “ Ti,i`m Y tAu
end if
end for
end for
end for
end for
DCFLs- CFLs that can be accepted by DPDA. DCFLs always admit an unambiguous grammar.
DCFLs ⊊ unambiguous CFLs.