CS340 Theory of Computation VI

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

CS340: Theory of Computation

Lecture 19
Some examples of conversion to Chomsky Normal form
Eg.: The set of Balanced Parentheses S Ñ rSs | SS | ϵ

S Ñ rSs | SS | rs (removing ϵ)
S Ñ ASB | SS | AB AÑa BÑb (creating non-terminals for required)
S Ñ AB | AC | AC | SS C Ñ SB A Ñ r B Ñs
(replacing S Ñ ASB to S Ñ AC and C Ñ SB)

Consider tan bn | n ě 0u is not a regular set, but is a CFL. What about tan bn cn | n ě 0u ñ It
is not a CFL.

Linear Grammar
A CFG G is right linear if all productions are of the following form:

A Ñ xB, A Ñ x for A, B P N, x P Σ˚

That is atmost one non-terminal appear on the RHS, and that non-terminal must be the right-
most symbol.

A CFG G is left linear if all productions are of the following form:

A Ñ Bx, A Ñ x for A, B P N, x P Σ˚

That is atmost one non-terminal appear on the RHS, and that non-terminal must be the left-
most symbol.

Eg.: G1 “ ptSu, ta, bu, P1 , Sq with P1 : looooomooooon


S Ñ abS | a
right linear
LpG1 q “ Lppabq˚ aq

Eg.: G2 : ptS, S1 , S2 u, ta, bu, P2 , Sq with P2 : S Ñ S1 ab, S1 Ñ S1 ab | S2 , S2 Ñ a


loooooooooooooooooooooomoooooooooooooooooooooon
left linear
LpG2 q “ Lpapabq˚ q

A linear grammar is a grammar in which at most one non-terminal can occur on the RHS
of any production irrespective of the position.
Note: A regular grammar is a linear grammar, however not all linear grammars are regular.
Theorem 1: let G be a right linear grammar then LpGq is regular.
Theorem 2: let A Ď Σ˚ be a regular set, then there exists a right linear grammar G, s.t.
A “ LpGq.

Theorem: A Ď Σ˚ is regular iff D a regular grammar G, s.t. LpGq “ A.


2

Non-deterministic Pushdown Automata (NPDA)


Q Read only
Finite state (left to right)
Push
Pop Input tape
as input

Stack

Working of the Machine


• pops the top of the stack
• makes a transition based on the top of stack, the current state, and input symbol
• Transition: push a sequence of symbols onto the stack, change state and move the read
head one cell to the right.
ϵ-transitions are allowed: the machine can pop and push symbols onto the stack, without
reading an input symbol or moving the input head pointer.
Stack can store unbounded information, but access is limited.

Definition of non-deterministic PDA


M “ pQ, Σ, Γ, δ, s, K, Fq
Q- finite set of states
Σ- finite set: input alphabet
s P Q- start states for M
F Ď Q- set of final/accept states for M
Γ- a finite set: stack alphabet
K- initial stack symbol
δ Ď pQ ˆ pΣ Y tϵuq ˆ Γq ˆ pQ ˆ Γ˚ qq

hkkfirst
ikkj
Eg.: ppp, a, Aq, pq, looB1on, ..., Bk qq P δ
mo
last

Lecture 20
Configurations
M “ pQ, Σ, Γ, δ, s, K, F q, a configuration of M is an element of Q ˆ Σ˚ ˆ Γ˚ , which denote the cur-
rent state of the machine, the part of the input that is unread, and the current content of the stack.

start configuration: ps, x, Kq


3

1
Define: 1-step next configuration relation: ÝÑ.
M
1
if ppp, a, Aq, pq, γqq P δ then for any y P Σ˚ and β P Γ˚ , we have pp, ay, Aβq ÝÑ pq, y, γβq
M
1
if ppp, ϵ, Aq, pq, γqq P δ then for any y P Σ˚ and β P Γ˚ , we have pp, y, Aβq ÝÑ pq, y, γβq
M
˚ 1
Let ÝÑ denote the reflexive-transitive closure of ÝÑ.
M M

Acceptance
Acceptance is of two types in NPDA: By final state and empty stack
˚
By final state: M accepts x by final state if ps, x, Kq ÝÑ pq, ϵ, γq for some q P F, γ P Γ˚
M
˚
By empty stack : M accepts x by empty stack if ps, x, Kq ÝÑ pq, ϵ, ϵq for some q P Q.
M

LpM q- set of all strings P Σ˚ accepted by M .


Both acceptance by final state and acceptance by empty stack have same expressive power.

Eg.: Set of balanced parentheses. NPDA- accepting by empty stack:


Q ´ tqu, Σ “ tr, su, Γ “ tK, su, s “ q and
ppq, r, Kq, pq, rKqq P δ
ppq, r, rq, pq, rrqq P δ
ppq, r, sq, pq, ϵqq P δ
ppq, ϵ, Kq, pq, ϵqq P δ

Eg.: L “ twwR | w P ta, bu˚ u. NPDA- accepting by final state


Let M “ ptq0 , q1 , q2 u, ta, bu, ta, b, Ku, δ, q0 , K, F q, where F “ tq2 u
It reads w
ppq0 , a, Kq, pq0 , aKqq; ppq0 , b, Kq, pq0 , bKqq P δ
ppq0 , a, aq, pq0 , aaqq; ppq0 , b, aq, pq0 , baqq; ppq0 , a, bq, pq0 , abqq, ppq0 , b, bq, pq0 , bbqq P δ
It makes a transition to wR
ppq0 , ϵ, aq, pq1 , aqq; ppq0 , ϵ, bq, pq1 , bqq; ppq0 , ϵ, Kq, pq1 , Kqq P δ
It reads wR
ppq1 , a, aq, pq1 , ϵqq; ppq1 , b, bq, pq1 , ϵqq P δ
It reaches accepting state
ppq1 , ϵ, Kq, pq2 , ϵqq

Equivalence of NPDAs and CFGs


NPDAs, and CFGs have equivalent expressive power.
Theorem 1: Given a CFG G, we can construct a NPDA M , s.t. LpM q “ LpGq

Theorem 2: Given a NPDA M , we can construct a CFG G, s.t. LpGq “ LpM q


4

Closure properties of CFLs


Union
Suppose A and B are CFLs, where LpG1 q “ A, LpG2 q “ B, for some CFGs, G1 , G2 . Also let S1
be the start symbol of G1 , and S2 be the start
Ť symbol for G2 .
Construct a grammar G, s.t. LpGq “ A B, as follows:
Ensure that G1 , and G2 have a disjoint set (rename the non-terminals if required)
Combine the productions of G1 and G2
Add a new production and a start symbol S, s.t. S Ñ S1 | S2

Concatenation
if A and B are CFLs, with LpG1 q “ A and LpG2 q “ B, with start symbols S1 , S2 respectively,
construct CFG G s.t. LpGq “ AB “ txy | x P A, y P Bu
Combine the grammars G1 , G2
Add a new start symbol with production S Ñ S1 S2

Kleene star
1 1
if A is a CFL, s.t. LpGq “ A, and start symbol is S1 , construct a G , s.t. LpG q “ A˚ as follows:
Take G along with a new start symbol S, and along with the production
S Ñ S1 S | ϵ

Intersection
CFLs are not closed under intersection.
tan bn cn | n ě 0u
Eg.: tam bm cn | m, n ě 0u tan bm cm | m, n ě 0u “ looooooooomooooooooon
Ş

not a CFL
Intersection of a CFL and a regular set

Theorem: CFLs are closedŞunder intersection with regular sets: if A Ď Σ˚ is a CFL and
B Ď Σ˚ is a regular set then A B is a CFL.

Brief proof idea: Consider a NPDA M1 , and a DFA M2 , s.t. LpM1 q “ A, and LpM2 q “ B
NPDA N : apply product construction on M1 , and M2 . That is states of N are product of
states M1 and M2 . Stack of N , simulates stack of M1 .

Lecture 21
Pumping Lemma for CFLs
If A Ď Σ˚ is a CFL, Dk ě 0 s.t. for every z P A, s.t. |z| ě k, can be split into five substrings,
z “ uvwxy, s.t. vx ‰ ϵ, |vwx| ď k and @i ě 0, uv i wxi y P A.

To prove the pumping lemma, let us introduce the notion of parse or derivation trees.
Parse tree is a tree satisfying the following:
5

1. Each interior node is labelled with an element in N .


2. Each leaf node is labelled something from Σ
3. If an interior node is labelled A and its children are labelled B1 , B2 , ..., Bk , then A “
B1 B2 ...Bk .

Eg.: Consider the derivation of the string a4 b4 from the CFG: G : S Ñ AC | AB A Ñ


a B Ñ b C Ñ SB

a4 b4 derivation tree

Observation: Parse trees of Chomsky grammars for long strings must have long paths, because
the number of symbols can at most double when you go down a level. In order to have 2n symbols
at the bottom level, the tree must be of the depth atleast n, i.e. it must have at least n ` 1 levels.
Proof: Let G be the grammar for A in Chomsky normal form. Take k “ 2n`1 , where n is the
number of non-terminals in G. Let z P A, s.t. |z| ě k, then parse tree of z must be of depth
atleast n ` 1. That path of the therefore must contain atleast n ` 1 occurrences of non-terminals.
Take the first pair of occurences of the non-terminal that appeares twice bottom-up. For eg.:

marking the required non-terminals


6

Say X is non-terminal. Now break z into substrings uvwxy s.t., w is the string generated
by the lower occurrence of X and vwx is generated by the upper occurrence of X. Thus in this
example w “ ab and vwx “ aabb
Further let T be the subtree rooted at the upper occurrence of X, and t be the subtree rooted
at the lower occurrence of X.

the subtrees

By removing t and replacing it with T we get the parse tree for uv 2 wx2 y. This can be repeated,
and hence the parse tree for uv i wxi y, i ě 1 can be generated. We can also cut out T and replace
it with t to get the parse tree for uv 0 wx0 y “ uwy.

Contrapositive of the pumping lemma (Useful to prove non-context free grammars): A


is not context free if the following property suffices: @k ě 0, Dz P A, s.t. |z| ě k, s.t. for all ways
of splitting z “ uvwxy with vx ‰ ϵ and |vwx| ď k, Di ě 0, s.t. uv i wxi y R A

Eg.: Use the pumping lemma to show that A “ tan bn an | n ě 0u is not context-free.
Let for a given k, we have z “ ak bk ck , we have |z| “ 3k ě k. Let z “ uvwxy, s.t vx ‰ ϵ and
|vwx| ď k. Let us pick i “ 2.
We have if either v or x to contain atleast one a, and atleast one b, then uv 2 wx2 y is not of the
form a˚ b˚ a˚ , hence not in A.
If v and x contain only a’s then uv 2 wx2 y has twice as many a’s as b’s, hence not in A, similarly
v and x having only b, prevents uv 2 wx2 y to be in A.
Finally if one of v and x contains only a’s and other only b’s then uv 2 wx2 y cannot be of the
form am bm am , hence not in A. Thus by pumping lemma, A is not context free.

Eg.: Use the pumping lemma to show that A “ tww | w P ta, bu˚ u is not context-free.
We have the family of CFLs closed under intersection with regular sets. Hence it suffices to
1
show A “ A X Lta˚ b˚ a˚ b˚ u “ tan bm an bm | m, n ě 0u is not context free.
For a given k, let z “ ak bk ak bk . Let four substrings of the form ak or bk be blocks.
Let z “ uvwxy, s.t. vx ‰ ϵ, and |vwx| ď k. Let us pick i “ 2.
If one of v or x contains both a and b, then uv 2 wx2 y is not of the form a˚ b˚ a˚ b˚ , thus not in
1
A.
If v and x are both from the same block, then uv 2 wx2 y has one block longer than the other
three.
If v and x are in different blocks, then blocks must be adjacent, otherwise |vwx| would be
greater than k. This would again yield uv 2 wx2 y to be not of the form an bm an bm .
1
Hence A is not a CFL.
7

Lecture 22
We can use the contrapositive of the pumping lemma to prove whether a language is not context
free.
Eg.: A “ tw P ta, b, cu˚ | #a pwq “ #b pwq “ #c pwqu
We know that CFLs are closed under intersection with regular sets. Consider thus B “
A X Lpa˚ b˚ c˚ q “ tan bn cn | n ě 0u. We know that B is not CFL. Thus A is not a CFL.

Deterministic PDA

M “ pQ, Σ, Γ, δ, K, %, s, Fq (1)

% is a special symbol, not in Σ: Right end marker

δ Ď pQ ˆ pΣ Y t%u Y tϵuq ˆ Γq ˆ pQ ˆ Γ˚ q
- For any p P Q Y t%u, A P Γ, δ contains exactly one transition of the form
ppp, a, Aq, pq, Bqq OR ppp, ϵ, Aq, pq, Bqq

- K is always at the bottom of the stack. Thus all transitions involving K, would be of the
form: ppp, a, Kq, pq, βKqq
˚
- Acceptance is by final state: ps, x %, Kq Ý
Ñ pq, ϵ, Bq
G
Properties of DCFL: DCFLs are closed under complementation, but are not closed under union,
intersection. DFCLs ⊊ CFLs.

CFG ” PDA
Consider the following theorems, which show the equivalence between an NPDA and a CFG. CFGs
and NPDAs have equal expressive power.
Theorem 1: given a CFG G, we can construct a NPDA M , s.t. LpM q. “ LpGq
Theorem 2: given a NPDA M , we can construct a CFG G, s.t. LpGq “ LpM q

CFG G Ñ
Ý M
Suppose we have a CFG G “ pN, Σ, P, Sq, we can assume without loss of generality that all
productions in P are of the form A Ñ cB1 B2 ...Bk , where c P Σ Y tϵu, k ě 0.

Greibach Normal form: another normal form like the Chomsky normal form. All productions
in P are of the form A Ñ cB1 B2 ...Bk , for c P Σ Y tϵu, k ě 0.
The Greibach normal form, does not reduce expressive power. It is just a conventional notation.

Consider the following NPDA M “ ptqu, Σ, N, δ, q, S, ϕq where q is the only state in the NPDA,
Σ is the set of terminal of G, which is the input alphabet of M , N is the set of non-terminals of G,
and is the stack alphabet of M . S is the initial symbol of G, and the bottom of the initial stack
in M . The set of final states is ϕ, which is irrelevant as the NPDA accepts by empty stack.
For each production A Ñ cB1 B2 ...Bk , in P , δ, contains the transition ppq, c, Aq, pq, B1 B2 ...Bk qq.
We shall shop that indeed LpM q “ LpGq. Let’s first consider an example.
8

Example of Balanced Parentheses


We have the CFG G “ ptS, Bu, tr, su, P, Sq where P are:

S.No. Productions (Greibach NF) Corresponding transitions in NPDA M


(i) S Ñ rBS ppq, r, Sq, pq, BSqq
(ii) S Ñ rB ppq, r, Sq, pq, Bqq
(iii) S Ñ rSB ppq, r, Sq, pq, SBqq
(iv) S Ñ rSBS ppq, r, Sq, pq, SBSqq
(v) B Ñs ppq, s, Bq, pq, ϵqq

Leftmost derivation: derivations in which the productions are always applied to the leftmost
non-terminal.
The leftmost derivation in G of the terminal string corresponds exactly to the accepting com-
binations in M . The sequence of sentential forms in the leftmost derivation corresponds to the
sequence of configurations of M in the computation.
Let x “ r r r s s r s s

Rule Derivation steps Configs of M in an accepting computation of x


S pq, r r r s s r s s, Sq
(iii) rSB pq, r r s s r s s, SBq
(iv) r rSBSB pq, r s s r s s, SBSBq
(ii) r r rBBSB pq, s s r s s, BBSBq
(v) r r r sBSB pq, s r s s, BSBq
(v) r r r s sSB pq, r s s, SBq
(ii) r r r s s srBB pq, s s, BBq
(v) r r r s s s sB pq, s, Bq
(v) rrrssrss pq, ϵ, ϵq

In the sentential form the terminal string is generated from left to right, one terminal at a
time, just like the input string x is read from left to right in one symbol at a time. Thus the two
strings of terminals occurring in a row concatenate to give x.
n
Lemma: for any z, y P Σ˚ , γ P N ˚ and A P N , A Ý
Ñ zγ by a leftmost derivation iff
G
n
pq, zy, Aq ÝÑ pq, y, γq.
M

Proof : By induction on n.
Base case: n “ 0, we have

0

Ñ zγ ðñ A “ zγ
G
ðñ z “ ϵ and A “ γ
ðñ pq, zy, Aq “ pq, y, γq
0
ðñ pq, zy, Aq Ý
Ñ pq, y, γq
G

Induction: Two implications ñ and ð separately.


9

n`1
First let A ÝÝÑ zγ by a leftmost derivation. Then B Ñ cβ be the last production applied,
G
where c P Σ Y tϵu and β P N ˚ .
n 1

Ñ uBα Ý
Ñ ucβα
G G

where z “ uc, γ “ βα. By induction hypothesis


n
pq, ucy, Aq ÝÑ pq, cy, Bαq
M

By definition of M ,
ppq, c, Bq, pq, βqq P δ
Thus
1
pq, cy, Bαq ÝÑ pq, y, βαq
M

Hence combining we have:


n`1
pq, zy, Aq “ pq, ucy, A ÝÝÑ pq, y, βαq “ pq, y, γq
M

Conversely, suppose:
n`1
pq, zy, Aq ÝÝÑ pq, y, γq
M

and let ppq, c, Bq, pq, βqq be the last transition taken, then z “ uc for some u P Σ˚ , γ “ βα for
some α P Γ˚ . Thus we have:
n 1
pq, ucy, Aq ÝÑ pq, cy, Bαq ÝÑ pq, y, βαq
M M

By induction hypothesis
n

Ñ uBα
G

by the leftmost derivation in G, and the construction in M : B Ñ cβ P P pGq, hence:


n 1

Ñ uBα Ý
Ñ ucβα “ zγ
G G

Finally establishing the claim: LpGq “ LpM q


Proof : let x P Σ˚ , s.t.
˚
x P LpGq ðñ S Ý
Ñ x by some leftmost derivation (Defn. of G)
G
˚
ðñ pq, x, Sq ÝÑ pq, ϵ, ϵq (Lemma)
M
ðñ x P LpM q (Defn. of LpM q)

NPDA Ñ
Ý CFG
We will have to establish 2 things:
Step 1 : Every PDA, can be simulated by a PDA with only one state.
Step 2 : Every PDA of one state, has equivalent CFG (invert construction in CFG Ñ NPDA)
10

Lecture 23
Set membership question
Question: Given a set A Ď Σ˚ , and x P Σ˚ . Is x P A?
Depends on the way A is presented. If A is given as a language of a DFA, then we have a linear
time algorithm Opkq, where k is the size of the input string. However, if we have A represented as
language of an NFA. The naive solution would be to convert the NFA into a DFA, and run. This
gives exponential bound on the time complexity.
A better algorithm would to be maintain the ‘active’ set. Start with the set of start states.
For each state in the set, make transition over the input symbol, and create and update the new
set of possible states. This gives a bound of Opn2 kq.

Question: Given a CFL A Ď Σ˚ , and x P Σ˚ . Is x P A?


simulating a NPDA, may be unresonable because the stack is unbounded and is non-deterministic.
Consider A as presented by a grammar G (assume in Chomsky Normal Form). Then run the
productions for 2 ˆ |x|.

The Cocke-Kasami-Younger Theorem


Runs in cubic time. Assume the grammar G is given in its Chomsky normal form. Let us first
illustrate the algorithm with the help of an illustration.
Eg.: G : S Ñ AB | BA | SS | AC | BD A Ñ a B Ñ b C Ñ SB D Ñ SA. And now
consider the string x “ aabbab. Consider the indexing (in-between), i.e. starts before first symbol
and following intermediate, ends after last symbol. Therefore for all 0 ď i ă j ď 6, we have xij
denoting the substring of x between lines i and j.
Build the following table:

0
1
2
3
4
5
6

Iteration 0

Where each entry Tij store the list of non-terminals that generate xij . Thus we are interested,
whether or not S is one of the non-terminals in T0,6 of not.

We start with length 1. Thus substrings of the form xi,i`1 , and corresponds to the table entry
in the top diagonal. For each substring c “ xi,i`1 , if there is a production X Ñ c P G. Thus,
Now we proceed to strings of length 2. For each substring xi,i`2 , we break it into two non-null
string substrings xi,i`1 , xi`1,i`2 , of length 1, and check the table entries Ti,i`1 , Ti`1,i`2 . We take a
non-terminal from each of this position, (say X from Ti,i`1 , and Y from Ti`1,i`2 ), and see if there
exists a production Z Ñ XY . For example, X0,2 “ aa, we have A P T0,1 , and A P T1,2 , so we look
11

0
A 1
A 2
B 3
B 4
A 5
B 6

Iteration 1

for a production with AA on the right hand side, and since there are any, we have T0,2 as empty
set ϕ.

0
A 1
ϕ A 2
S B 3
ϕ B 4
S A 5
S B 6

Iteration 2

Now for strings of length 3, we have x0,3 “ x0,1 x1,3 “ x0,2 x2,3 . We need to check both the
possibilities. First we find A P T0,1 , and S P T1,3 , so we look for productions with AS on the right,
and since there is nothing, we check for T0,2 , T2,3 . Since, T0,2 is ϕ, we ultimately find that T0,3 is
ϕ. We continue this way to complete the table:

0
A 1
ϕ A 2
ϕ S B 3
S C ϕ B 4
D S ϕ S A 5
S C ϕ C S B 6

Last iteration

Since S P T0,6 , therefore X P A.


12

CKY Algorithm
for i “ 0 to n ´ 1 do
Ti,i`1 “ H
for each production A Ñ a in G do
if a “ xi,i`1 then
Ti,i`1 “ Ti,i`1 Y tAu
end if
end for
end for
for m “ 2 to n do
for i “ 0 to n ´ m do
Ti,i`m “ H
for j “ i ` 1 to i ` m ´ 1 do
for each production A Ñ BC in P of G do
if B P Ti,j and C P Tj,i`m then
Ti,i`m “ Ti,i`m Y tAu
end if
end for
end for
end for
end for

Running time is Opn3 pq, n “ |x|, p “ |P |

Ambiguous and unambiguous grammars


A CFG G is ambiguous, if Dx P LpGq, for which there are 2 different parse trees.
Defn.: A string is derived ambiguously in a CFG G if it has two different leftmost derivation.
Grammar G is ambiguous if it generates some string ambiguously. G is unambiguous, if G is
not ambiguous.

A CFL A Ď Σ˚ is inherently ambiguous if @ CFG G, s.t. LpGq “ A. G is ambiguous.

Note: There are inherently ambiguous.

DCFLs- CFLs that can be accepted by DPDA. DCFLs always admit an unambiguous grammar.
DCFLs ⊊ unambiguous CFLs.

You might also like