Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

CS 373: Theory of Computation

Manoj Prabhakaran Mahesh Viswanathan


Fall 2008
Part I
Lecture 12
1
Context Free Grammars
1 Introduction
Parenthesis Matching
Problem
Describe the the set of arithmetic expressions with correctly matched parenthesis.
Arithmetic expressions with correctly matched parenthesis cannot be described by a regular
expression
Let L be the language of correct expressions
Suppose h maps number and variables to , and opening parenthesis to 0 and closing paren-
thesis to 1 then h(L) 0, 1

h(L) 0

= 0
n
1
n
[ n 0 which is not regular.
This is an example of a context-free language, that we study.
Parenthesis Matching
Inductive Denition
Ignoring numbers and variables, and focussing only on parenthesis, correctly matched expressions
can be dened as
The is a valid expression
A valid string (,= ) must either be
The concatenation of two correctly matched expressions, or
It must begin with ( and end with ) and moreover, once the rst and last symbols are
removed, the resulting string must correspond to a valid expression.
Parenthesis Matching
Grammar
Taking E to be the set of correct expressions, the inductive denition can be succinctly written as
E
E EE
E (E)
English Sentences
2
English sentences can be described as
S) NP)V P)
NP) CN) [ CN)PP)
V P) CV ) [ CV )PP)
PP) P)CN)
CN) A)N)
CV ) V ) [ V )NP)
A) a [ the
N) boy [ girl [ ower
V ) touches [ likes [ sees
P) with
English Sentences
Examples
noun-phrs

a

article
boy

noun
verb-phrs

sees

verb
noun-phrs

the

article
boy

noun
verb-phrs

sees

verb
a ower

noun-phrs
Applications
Such rules (or grammars) play a key role in
Parsing programming languages
Markup Languages like HTML and XML.
Modelling software
2 Formal Denition
2.1 Grammars
Context-Free Grammars
Denition 2.1. A context-free grammar (CFG) is G = (V, , R, S) where
V is a nite set of variables also called nonterminals or syntactic categories. Each variable
represents a language.
3
is a nite set of symbols, disjoint from V , called terminals, that form the strings of the
language.
R is a nite set of rules or productions. Each production is of the form A where A V
and (V )

S V is the start symbol; it is the variable that represents the language being dened.
Other variables represent auxiliary languages that are used to dene the language of the start
symbol.
2.2 Examples
Example of a CFG
Example 2.2. Let G
par
= (V, , R, S) be
V = E
= (, )
R = E , E EE, E (E)
S = E
Palindromes
Example 2.3. A string w is a palindrome if w = w
R
. For example, madaminedenimadam(Madam,
in Eden, Im Adam)
G
pal
= (S, 0, 1, R, S) denes palindromes over 0, 1, where R is
S
S 0
S 1
S 0S0
S 1S1
Or more briey, R = S [ 0 [ 1 [ 0S0 [ 1S1
Arithmetic Expressions
Consider the language of all arithmetic expressions (E) built out of integers (N) and identiers (I),
using only + and
G
exp
= (E, I, N, a, b, 0, 1, (, ), +, , , R, E) where R is
E I [ N [ E +E [ E E [ (E)
I a [ b [ Ia [ Ib
N 0 [ 1 [ N0 [ N1 [ N [ +N
4
2.3 Context-free Language
Language of a CFG
Recursive Denition
Strings known to be in the language of variables occuring in the RHS of a rule, concatenated with
the terminals (in the body) in the right order, belong to the language of variable in the LHS;
language of grammar are strings in S.
String Language Rule
a I I a
1 N N 1
1 N N N
a E E I
a +a E E E +E
(a +a) E E (E)
1 E E N
(a +a) 1 E E E E
Language of a CFG
Derivations
Expand the start symbol using one of its rules. Further expand the resulting string by expanding
one of the variables in the string, by the RHS of one of its rules. Repeat until you get a string of
terminals.
E E E E N E N E N E 1
(E) 1 (E +E) 1 (E +I) 1
(E +a) 1 (I +a) 1 (a +a) 1
Derivations
Formal Denition
Denition 2.4. Let G = (V, , R, S) be a CFG. We say A
G
, where , , (V )

and A V if A is a rule of G.
We say

G
if either = or there are
0
,
1
, . . .
n
such that
=
0

G

1

G

2

G

G

n
=
Notation
When G is clear from the context, we will write and

instead of
G
and

G
.
Context-Free Language
Denition 2.5. The language of CFG G = (V, , R, S), denoted L(G) is the collection of strings
over the terminals derivable from S using the rules in R. In other words,
L(G) = w

[ S

w
5
Denition 2.6. A language L is said to be context-free if there is a CFG G such that L = L(G).
Palindromes Revisited
Recall, L
pal
= w 0, 1

[ w = w
R
is the language of palindromes.
Consider G
pal
= (S, 0, 1, R, S) denes palindromes over 0, 1, where R = S [0[1[0S0[1S1
Proposition 2.7. L(G
pal
) = L
pal
Proving Correctness of CFG
L
pal
L(G
pal
)
Proof. Let w L
pal
. We prove that S

w by induction on [w[.
Base Cases: If [w[ = 0 or [w[ = 1 then w = or 0 or 1. And S [ 0 [ 1.
Induction Step: If [w[ 2 and w = w
R
then it must begin and with the same symbol. Let
w = 0x0. Now, w
R
= 0x
R
0 = w = 0x0; thus, x
R
= x. By induction hypothesis, S

x.
Hence S 0S0

0x0. If w = 1x1 the argument is similar.
Proving Correctness of CFG
L
pal
L(G
pal
)
Proof (contd). Let w L(G), i.e., S

w. We will show w L
pal
by induction on the number of
derivation steps.
Base Case: If the derivation has only one step then the derivation must be S , S 0 or
S 1. Thus w = or 0 or 1 and is in L
Pal
.
Induction Step: Consider an (n+1)-step derivation of w. It must be of the form S 0S0

0x0 = w or S 1S1

1x1 = w. In either case S

x in n-steps. Hence x L
Pal
and so
w = w
R
.
Parse Trees
For CFG G = (V, , R, S), a parse tree (or derivation tree) of G is a tree satisfying the following
conditions:
Each interior node is labeled by a variable in V
Each leaf is labeled by either a variable, a terminal or ; a leaf labeled by must be the only
child of its parent.
If an interior node labeled by A with children labeled by X
1
, X
2
, . . . X
k
(from the left), then
A X
1
X
2
X
k
must be a rule.
6
S
0 S
1 S
1 S

1
1
0
Figure 1: Example Parse Tree with yield 011110
Yield of a parse tree is the concatenation of leaf labels (leftright)
Parse Trees and Derivations
Proposition 2.8. Let G = (V, , R, S) be a CFG. For any A V and (V )

, A

i
there is a parse tree with root labeled A and whose yield is .
Proof. (): Proof by induction on the number of steps in the derivation.
Base Case: If A then A is a rule in G. There is a tree of height 1, with root A and
leaves the symbols in .
A

1

2
n
Figure 2: Parse Tree for Base Case
Parse Trees for Derivations
Proof (contd). (): Proof by induction on the number of steps in the derivation.
Induction Step: Let A

in k + 1 steps.
Then A


1
X
2

1

2
= , where X X
1
X
n
= is a rule
By ind. hyp., there is a tree with root A and yield
1
X
2
.
Add leaves X
1
, . . . X
n
and make them children of X. New tree is a parse tree with desired
yield.
7
A

1 X

2

Figure 3: Parse Tree for Induction Step


Derivations for Parse Trees
Proof (contd). (): Assume that there is a parse tree with root A and yield . Need to show that
A

. Proof by induction on the number of internal nodes in the tree.
Base Case: If tree has only one internal node, then it has the form as in picture
Then, = X
1
X
n
and A is a rule. Thus, A

.
A

1

2
n
Figure 4: Parse Tree with one internal node
Derivations for Parse Trees
Proof (contd). () Induction Step: Suppose is the yield of a tree with k + 1 interior nodes.
Let X
1
, X
2
, . . . X
n
be the children of the root ordered from the left. Not all X
i
are leaves, and
A X
1
X
2
X
n
must be a rule.
Let
i
be the yield of the tree rooted at X
i
; so X
i
is a leaf
i
= X
i
Now if j < i then all the descendents of X
j
are to the left of the descendents of X
i
. So
=
1

2

n
.
8
A
X
1
X
2
X
n

1

2

n
Figure 5: Tree with k + 1 internal nodes
Derivations for Parse Trees
Proof (contd). () Induction Step: Suppose is the yield of a tree with k + 1 interior nodes.
Each subtree rooted at X
i
has at most k internal nodes. So if X
i
is a leaf X
i


i
and if X
i
is not a leaf then X
i


i
(ind. hyp.).
Thus A X
1
X
2
X
n


1
X
2
X
n

2
X
n


1

n
=
A
X
1
X
2
X
n

1

2

n
Recap . . .
For a CFG G with variable A the following are equivalent
1. Recursive inference procedure determines that w

is in the language of A.
2. A

w
3. There is a parse tree with root A and yield w
Context-free-ness
CFGs have the property that if X

then X


9
3 Ambiguity
3.1 The Concept
Multiple Parse Trees
Example 1 The sentence the girl touches the boy with the ower has the following parse trees
S
NP
CN
A
the
N
girl
V P
V
touches
NP
CN
A
the
N
boy
PP
P
with
CN
A
the
N
ower
S
NP
CN
A
the
N
girl
V P
CV
V
touches
NP
CN
A
the
N
boy
PP
P
with
CN
A
the
N
ower
Multiple Parse Trees
Example 2 The parse trees for expression a +b a in the grammar G
exp
is
E
E
I
a
+ E
E
I
b

E
I
a
10
E
E
E
I
a
+ E
I
b

E
I
a
Ambiguity
Denition 3.1. A grammar G = (V, , R, S) is said to be ambiguous if there is w

for which
there are two dierent parse trees.
Warning!
Existence of two derivations for a string does not mean the grammar is ambiguous!
3.2 Removing Ambiguity
Removing Ambiguity
Ambiguity maybe removed either by
Using the semantics to change the rules. For example, if we knew who had the ower (the
girl or the boy) from the context, we would know which is the right interpretation.
Adding precedence to operators. For example, binds more tightly than +, or else binds
with the innermost if.
An Example
Recall, G
exp
has the following rules
E I [ N [ E +E [ E E [ (E)
I a [ b [ Ia [ Ib
N 0 [ 1 [ N0 [ N1 [ N [ +N
New CFG G

exp
has the rules
I a [ b [ Ia [ Ib
N 0 [ 1 [ N0 [ N1 [ N [ +N
F I [ N [ (E)
T F [ T F
E T [ E +T
11
Ambiguity: Computational Problems
Removing Ambiguity
Problem: Given CFG G, nd CFG G

such that L(G) = L(G

) and G

is unambiguous.
There is no algorithm that can solve the above problem!
Deciding Ambiguity
Problem: Given CFG G, determine if G is ambiguous.
There is no algorithm that can solve this problem!
Inherently Ambiguous Languages
Problem: Is it the case that for every CFG G, there is a grammar G

such that L(G) = L(G

) and
G

is unambiguous, even if G

cannot be constructed algorithmically?


No! There are context-free languages L such that every grammar for L is ambiguous.
Denition 3.2. A context-free language L is said to be inherently ambiguous if every grammar G
for L is ambiguous.
Inherently Ambiguous Languages
An Example
Consider
L = a
i
b
j
c
k
[ i = j or j = k
One can show that any CFG G for L will have two parse trees on a
n
b
n
c
n
, for all but nitely many
values of n
One that checks that number of as = number of bs
Another that checks that number of bs = number of cs
12
Part II
Lecture 13
13
Pushdown Automata
4 Computing Using a Stack
Beyond Finite Memory: The Stack
So far we considered automata with nite memory
Today: automata with access to an innite stack
The stack can contain an unlimited number of characters. But
can read/erase only the top of the stack: pop
can add to only the top of the stack: push
On longer inputs, automaton may have more items in the stack
Keeping Count Using the Stack
An automaton can use the stack to recognize 0
n
1
n

On reading a 0, push it into the stack


After the 0s, on reading each 1, pop a 0
(If a 0 comes after a 1, reject)
If attempt to pop an empty stack, reject
If stack not empty at the end, reject
Else accept
Matching Parenthesis Using the Stack
An automaton can use the stack to recognize balanced parenthesis
e.g. (())() is balanced, but ())() and (() are not
On seeing a ( push it on the stack
On seeing a ) pop a ( from the stack
If attempt to pop an empty stack, reject
If stack not empty at the end, reject
Else accept
14
5 Denition of Pushdown Automata
Pushdown Automata (PDA)
a
b b
a a
b
x
y
x
$
nite-state
control
input
stack
Figure 6: A Pushdown Automaton
Like an NFA with -transitions, but with a stack
Stack depth unlimited: not a nite-state machine
Non-deterministic: accepts if any thread of execution accepts
Pushdown Automata (PDA)
Has a non-deterministic nite-state control
At every step:
Consume next input symbol (or none) and pop the top symbol on stack (or none)
Based on current state, consumed input symbol and popped stack symbol, do (non-
deterministically):
1. push a symbol onto stack (or push none)
2. change to a new state
q
1
q
2
a, x y
If at q
1
, with next input symbol a and top of stack x, then can consume a, pop x, push y onto
stack and move to q
2
(any of a, x, y may be )
Pushdown Automata (PDA): Formal Denition
A PDA P = (Q, , , , q
0
, F) where
15
Q = Finite set of states
= Finite input alphabet
= Finite stack alphabet
q
0
= Start state
F Q = Accepting/nal states
: Q( ) ( ) T(Q( ))
6 Examples of Pushdown Automata
Matching Parenthesis: PDA construction
q
0
q q
F
, $ , $
(, (
), (
First push a bottom-of-the-stack symbol $ and move to q
On seeing a ( push it onto the stack
On seeing a ) pop if a ( is in the stack
Pop $ and move to nal state q
F
Matching Parenthesis: PDA execution
( ( ) ) ( ) )
$
q
input
stack
) ) ( ) )
(
(
$
q
16
) ( ) )
(
$
q
( ) )
$
q
) )
(
$
q
)
$
q
)
$
!
Palindrome: PDA construction
q
0
q

q
F
, $
,
a,
, $
a, a a, a
First push a bottom-of-the-stack symbol $ and move to a pushing state
Push input symbols onto the stack
Non-deterministically move to a popping state (with or without consuming a single input
symbol)
If next input symbol is same as top of stack, pop
If $ on top of stack move to accept state
Palindrome: PDA execution
17
m a
d
a m
$
q

a
d
a m
m
$
q

d
a m
a
m
$
q

a m
a
m
$
q

m
m
$
q

$
q

q
F
18
Part III
Lecture 14
19
Equivalence of CFGs and PDAs
7 Computation of a PDA
Instantaneous Description
In order to describe a machines execution, we need to capture a snapshot of the machine that
completely determines future behavior
In the case of an NFA (or DFA), it is the state
In the case of a PDA, it is the state + stack contents
Denition 7.1. An instantaneous description of a PDA P = (Q, , , , q
0
, F) is a pair q, ),
where q Q and

Computation
Denition 7.2. For a PDA P = (Q, , , , q
0
, F), string w

, and instantaneous descriptions


q
1
,
1
) and q
2
,
2
), we say q
1
,
1
)
w

P
q
2
,
2
) i there is a sequence of instanteous descriptions
r
0
, s
0
), r
1
, s
1
), . . . r
k
, s
k
) and a sequence x
1
, x
2
, . . . x
k
, where for each i, x
i
, such that
w = x
1
x
2
x
k
,
r
0
= q
1
, and s
0
=
1
,
r
k
= q
2
, and s
k
=
2
,
for every i, (r
i+1
, b) (r
i
, x
i+1
, a) such that s
i
= as and s
i+1
= bs, where a, b and
s

Example of Computation
Example 7.3.
q
0
q q
F
, $ , $
(, (
), (
q
0
, )
(()(
q, (($) because
q
0
, )
x
1
=
q, $)
x
2
=(
q, ($)
x
3
=(
q, (($)
x
4
=)
q, ($)
x
5
=(
q, (($)
20
8 Language Recognized
Acceptance/Recognition
Denition 8.1. A PDA P = (Q, , , , q
0
, F) accepts a string w

i for some q F and


, q
0
, )
w

P
q, )
Denition 8.2. The language recognized/accepted by a PDA P = (Q, , , , q
0
, F) is L(P) = w

[ P accepts w. A language L is said to be accepted/recognized by P if L = L(P).


9 Equivalence of CFGs and PDAs
Expressive Power of CFGs and PDAs
We will show that CFGs and PDAs have equivalent expressive powers. More formally, . . .
Theorem 9.1. For every CFG G, there is a PDA P such that L(G) = L(P). In addition, for
every PDA P, there is a CFG G such that L(P) = L(G). Thus, L is context-free i there is a PDA
P such that L = L(P).
10 CFG to PDA
10.1 Informal Ideas
From CFG to PDA
Problem
Given grammar G = (V, , R, S) need to design PDA P = (Q, , , , q
0
, F) such that L(P) = L(G).
In other words, given w

, the PDA P needs to gure out whether S


G
w.
Intuition
The PDA P will try to construct a derivation of w from S, be applying one rule at a time starting
from S.
Simulating Derivations
Challenge I: Choosing rule to apply
How do you choose the rule of grammar to apply to get the next step of derivation?
Use non-determinism to guess this choice!
Simulating Derivations
Challenge II: Storing Intermediate strings
21
In order to construct the derivation, we need to know what the current intermediate string is, so
that we know what rules/steps can be applied next. How do we store this intermediate string?
Store the intermediate string on the stack!
Doesnt work! The PDA can only read the top of the stack (which is say the leftmost
symbol of the intermediate string), but it needs to know some variable of the string to
know which rule to apply.
Store only part of the string on the stack; the part that immediately follows the rst (leftmost)
variable of the intermediate string
Portion before the rst variable in the intermediate string will be matched with the input
Example
0 1 1 0 0
A
1
B
$
q
The above PDA snapshot depicts intermediate string 01A1B as follows:
01 has been read from the input
A1B is on the stack; bottom of stack is $
PDA Algorithm
1. Push $ and the start symbol onto the stack.
2. Repeat the following steps
(a) If top of the stack is a variable A, pick (nondeterministically) a rule for A, and push
onto the stack the RHS of the rule; dont read any input symbol.
(b) If top of the stack is a terminal a, then read the next input symbol. If the input symbol
is not a, then this branch dies. Otherwise, continue.
(c) If top of the stack is $ then pop the symbol and go to accepting state. There are no
transition from accept state, and so if input is accepted only if there are no more input
symbols.
22
10.2 Formal Construction
Pushing Multiple Symbols
PDA Transitions
Formally, (q, x, a) contains pairs (q

, b) which means that the PDA reads at most one symbol from
input (if x ,= ), pops the top of the stack (if a ,= ) and pushes at most one symbol onto the stack
(if b ,= ).
To simplify the construction, we will assume that we can push as many symbols as we want in
one step. Thus, (q, x, a) contains pairs (q

, ), where = b
1
b
2
b
k

Can easily be carried out by additional states that push the symbols of one by one. Formally,
we have new states q
1
, . . . q
k1
and transitions
(q
1
, b
k
) (q, x, a), (q
1
, , ) = (q
2
, b
k1
),
(q
2
, , ) = (q
3
, b
k2
), . . . (q
k1
, , ) = (q

, b
1
)
Pushing Multiple Symbols
An Example
p q
a, s xyz
Figure 7: Transition pushing multiple symbols
p q
1
q
2
q
a, s z
, y
, x
Figure 8: Multiple pushes implemented by single pushes
Formal Denition of PDA
Let G = (V, , R, S) be a CFG. Dene P
G
= (Q, , , , q
0
, F) such that
Q = q
s
, q

, q
a

= V $, where $ , V
q
0
= q
s
23
F = q
a

Assuming a and A V , is given by


(q
s
, , ) = (q

, S$) (q

, a, a) = (q

, )
(q

, , A) = (q

, w) [ A w R (q

, , $) = (q
a
, )
In all other cases, () =
To remove steps that push more than one symbol, we will add more states.
State Diagram of PDA
q
s
q

q
a
, S$ , $
, A w for A w R
a, a for a
Figure 9: State Diagram of P
G
CFG to PDA
An example
Consider the CFG G = (S, T, a, b, S aTb [ b, T Ta [ , S). The PDA P
G
is as follows.
q
s
q

q
a
, $ , S , $
, S b, , T
a, a , b, b
, S b
, T
, a
, T a
, T
10.3 Proof of Correctness
Leftmost Derivations
24
Derivations
At each step, replace some variable in the intermediate string by the right-hand side of a rule in
the grammar.
Leftmost Derivations
At each step, replace the leftmost variable in the intermediate string by the right-hand side of a
rule in the grammar.
We will say
lm
if is obtained from by a leftmost derivation step.
Proposition 10.1. For any grammar G = (V, , R, S), A V , and (V )

, A

if and
only if A

lm
.
Head/Tail of Intermediate Strings
Denition 10.2. Let = wA (V )

be such that w

, A V and (V )

. The
head of is w, and the tail is A.
Example 10.3. The head of string 01A1B is 01 and the tail is A1B.
Correctness of Construction
Proposition 10.4. Let G be a CFG and let P
G
be the PDA constructed. Then L(P
G
) = L(G)
Proof. [L(G) L(P
G
)]: We will prove the following lemma.
Lemma 10.5. Suppose S

lm
x, where x is the head and is the tail of x. Then q
s
, )
x

P
G
q

, $).
Before proving this lemma, let us see how it helps us show that L(G) L(P
G
). Consider
w L(G); then S

lm
w. By the lemma, this means that q
s
, )
w

P
G
q

, $). Thus, q
s
, )
w

P
G
q

, $)

q
a
, ), and w L(P
G
). We now complete this case by proving the lemma.
Proof. (Of Lemma) By induction on the number of steps in the leftmost derivation S

lm
x.
Base Case: Suppose the number of derivation steps is 0. Then x = , and = S. Clearly,
q
s
, )
=x

P
G
q

, S$) = q

, $) and thus, lemma holds.


Induction Step: Suppose S

lm
x


lm
x, where S

lm
x

takes n steps. Then


by induction hypothesis, q
s
, )
x

P
G
q

, A

$). Further, x is obtained by replacing A by


some rule A . Further

= x
1
x
2
x
k
, where x
i
. Observe that x = x

x
1
x
2
x
k
.
Thus,
q
s
, )
x

, A

$)

q

$) = q

, x
1
x
k
$)
x
1
q

, x
2
x
k
$
x
2
)
x
k
q

, $)
[L(P
G
) L(G)]: The proof of the converse will rely on the following lemma.
25
Lemma 10.6. For every A V , if q

, A)
x

P
G
q

, ) then A

x.
Once again before proving the lemma, let us see how it allows us to conclude that L(P
G
) L(G).
Suppose w L(P
G
). Then w has an accepting computation, and the transitions ensure that this
computation is of the form
q
s
, )

q

, S$)
w
q

, $)

q
a
, )
Now since, $ is only popped by the transition fromq

to q
a
, it must be the case that q

, S)
w
q

, ).
Thus, by the lemma, S

w, which means that w L(G).
Proof. (Of Lemma) By induction on the number of steps in the computation q

, A)
x

P
G
q

, ).
Base Case: Suppose q

, A)
x

P
G
q

, ) in one step. Then, x = and A R. Thus,


lemma follows.
induction Step: Consider a computation of n + 1 steps. It must be of the form
q

, A)

q

, A
1
A
2
A
k
)
x
q

, )
Now, A A
1
A
2
A
k
must be a rule in R, where A
i
(V ). Further it must be the
case that there are x
1
, x
2
, . . . x
k
such that x = x
1
x
2
x
k
and q

, A
i
)
x
i
q

, ). Thus, the
computation q

, A
1
A
2
A
k
)
x
q

, ) is of the form
q

, A
1
A
2
A
k
)
x
1
q

, A
2
A
k
)
x
2

x
k
q

, )
By induction hypothesis, we have A
i

x
i
, if A
i
V , and A
i
= x
i
otherwise. Hence,
A

x
1
x
k
= x.
11 PDA to CFG
11.1 Normalized PDAs
From PDA to CFG
Proposition 11.1. For any PDA P, there is a CFG G such that L(P) = L(G).
Proof Outline
1. For every PDA P there is a normalized PDA P
N
such that L(P) = L(P
N
).
2. For every normalized PDA P
N
there is a CFG G such that L(P
N
) = L(G).
Normalized PDAs
26
Denition 11.2. A PDA P = (Q, , , , q
0
, F) is normalized i it satises the following conditions.
Has exactly one accept state, i.e., F = q
a
for some q
a
Q
Empties its stack before accepting, i.e., if q
0
, )
w

P
q
a
, ) then = .
Each transition either pushes one symbol, or it pops one symbol. There are no transitions
that both push and pop, nor transitions that leave the stack unaected.
Normalizing a PDA
Proposition 11.3. For every PDA P, there is a normalized PDA P
N
such that L(P) = L(P
N
)
Proof Sketch
We will transform P is a series of steps, each time ensuring that the language does not change.
We will ensure that there is only one accept state
Next, we will ensure that all symbols are popped before accept state is reached.
Finally, we will transform transitions to be either push or pop (not both or neither).
Normalizing a PDA
One accept state
To ensure one accept state, add -transitions (which do not change the stack) from old accept states
to a new accept state.
Formally, given P = (Q, , , , q
0
, F), let P

= (Q

, , ,

, q
0
, F

) where
Q

= Q q
a
, where q
a
, Q
F

= q
a

(q, x, a) = (q, x, a) if q Q F or x ,= or a ,= , and

(q, , ) = (q, , ) (q
a
, ) for
q F, and

(q
a
, x, a) = .
Normalizing a PDA
Emptying stack before acceptance
First push a new symbol $ before starting computation, and from sole accept state, pop all symbols
before popping $ and moving to a new accept state.
i.e., given P = (Q, , , , q
0
, q
a
), let P

= (Q

, ,

, q

0
, F

):

= $, where $ ,
Q

= Q q
i
, q
p
, q
a
where q
i
, q
p
, q
a
, Q
q

0
= q
i
27
F

= q
a

(q, x, a) = (q, x, a) for q Q q


a
or x ,= or a ,= . For q
a
, we have

(q
a
, , ) =
(q
a
, , ) (q
p
, ). In addition, we have

(q
i
, , ) = (q
0
, $), and

(q
p
, , a) = (q
p
, ) for
a , and

(q
p
, , $) = (q
a
, ). In all other cases

is .
Normalizing a PDA
Only pushes or pops
There are two kinds of transitions that need xing
Transition of the form q
x,ab
q

, where a, b , i.e., those that push and pop in one step


Replace this by two steps, where you rst pop a and then push b: q
x,a
q

,b
q

. (q

is a new state involved in only these transitions.)


Transition of the form q
x,
q

, i.e., those that neither push nor pop


Replace this by two steps, where rst a dummy symbol is pushed, and then in the second
step the dummy symbol is popped: q
x,
q

,
q

. (q

is a new state involved in


only these transitions.)
(Formal denition skipped.)
11.2 CFGs for Normalized PDAs
CFGs for Normalized PDAs
Intuitions
Let P = (Q, , , , q
0
, q
a
) be a normalized PDA. w L(P) i q
0
, )
w

P
q
a
, ).
If, for every p, q Q, we can describe L
p,q
= w[ p, )
w

P
q, ) using a CFG, then we are
done because L(P) is nothing but L
q
0
,qa
.
So CFG will have variables A
p,q
such that A
p,q

w i w L
p,q
.
What are the rules for A
p,q
?
Rules for the grammar
Consider w L
p,q
, and a computation corresponding to p, )
w

P
q, ). Since the computation
starts with empty stack, the rst step must be a push, and last step must be a pop, since we end
with empty stack.
Case I: The rst symbol pushed is popped only at the end. So we have p, )
a
r, A)
u

s, A)
b
q, ), with w = aub. And u L
r,s
. Can be captured by rule A
p,q
aA
r,s
b.
28
Case II: First symbol pushed is popped in the middle of computation (and then stack is
empty). So we have p, )
u
1
r, )
u
2
q, ). Can be captured by rule A
p,q
A
p,r
A
r,q
Formal Construction
Let P = (Q, , , , q
0
, q
a
) be a normalized PDA. Dene G
P
= (V, , R, S) where
V = A
p,q
[ p, q Q,
S = A
q
0
,qa
And the rules in R are
For every p Q, A
p,p

For every p, q, r Q, A
p,q
A
p,r
A
r,q
For every p, q, r, s Q, , a, b , if (r, ) (p, a, ) and (q, ) (r, b, )
then A
p,q
aA
r,s
b
11.3 Proof of Correctness
Correctness of Construction
Proposition 11.4. Let P be a normalized PDA and let G
P
be the corresponding CFG. Then
A
p,q

w i p, )
w

P
q, ).
Proof. The two directions are proved as follows
By induction on the number of steps in the derivation.
By induction on the number of steps in the computation.
Proof details in textbook.
Tying all the Ends
Proposition 11.5. Let P be a PDA then L(P) is context-free.
Proof. A normalized PDA P
N
can be constructed such that L(P) = L(P
N
)
A grammar G
P
can be constructed such that L(G
P
) = L(P
N
) = L(P). This is because
S = A
q
0
,qa

w i q
0
, )
w

P
N
q
a
, ) (by previous proposition) i w L(P
N
) = L(P)
29
Correctness of Construction
Proposition 11.6. Let G be a CFG and let P
G
be the PDA constructed. Then L(P
G
) = L(G)
Proof. [L(G) L(P
G
)]: We will prove the following lemma.
Lemma 11.7. Suppose S

lm
x, where x is the head and is the tail of x. Then q
s
, )
x

P
G
q

, $).
Before proving this lemma, let us see how it helps us show that L(G) L(P
G
). Consider
w L(G); then S

lm
w. By the lemma, this means that q
s
, )
w

P
G
q

, $). Thus, q
s
, )
w

P
G
q

, $)

q
a
, ), and w L(P
G
). We now complete this case by proving the lemma.
Proof. (Of Lemma) By induction on the number of steps in the leftmost derivation S

lm
x.
Base Case: Suppose the number of derivation steps is 0. Then x = , and = S. Clearly,
q
s
, )
=x

P
G
q

, S$) = q

, $) and thus, lemma holds.


Induction Step: Suppose S

lm
x


lm
x, where S

lm
x

takes n steps. Then


by induction hypothesis, q
s
, )
x

P
G
q

, A

$). Further, x is obtained by replacing A by


some rule A . Further

= x
1
x
2
x
k
, where x
i
. Observe that x = x

x
1
x
2
x
k
.
Thus,
q
s
, )
x

, A

$)

q

$) = q

, x
1
x
k
$)
x
1
q

, x
2
x
k
$)
x
2

x
k
q

, $)
[L(P
G
) L(G)]: The proof of the converse will rely on the following lemma.
Lemma 11.8. For every A V , if q

, A)
x

P
G
q

, ) then A

x.
Once again before proving the lemma, let us see how it allows us to conclude that L(P
G
) L(G).
Suppose w L(P
G
). Then w has an accepting computation, and the transitions ensure that this
computation is of the form
q
s
, )

q

, S$)
w
q

, $)

q
a
, )
Now since, $ is only popped by the transition fromq

to q
a
, it must be the case that q

, S)
w
q

, ).
Thus, by the lemma, S

w, which means that w L(G).
Proof. (Of Lemma) By induction on the number of steps in the computation q

, A)
x

P
G
q

, ).
Base Case: Suppose q

, A)
x

P
G
q

, ) in one step. Then, x = and A R. Thus,


lemma follows.
30
induction Step: Consider a computation of n + 1 steps. It must be of the form
q

, A)

q

, A
1
A
2
A
k
)
x
q

, )
Now, A A
1
A
2
A
k
must be a rule in R, where A
i
(V ). Further it must be the
case that there are x
1
, x
2
, . . . x
k
such that x = x
1
x
2
x
k
and q

, A
i
)
x
i
q

, ). Thus, the
computation q

, A
1
A
2
A
k
)
x
q

, ) is of the form
q

, A
1
A
2
A
k
)
x
1
q

, A
2
A
k
)
x
2

x
k
q

, )
By induction hypothesis, we have A
i

x
i
, if A
i
V , and A
i
= x
i
otherwise. Hence,
A

x
1
x
k
= x.
12 PDA to CFG
12.1 Normalized PDAs
From PDA to CFG
Proposition 12.1. For any PDA P, there is a CFG G such that L(P) = L(G).
Proof Outline
1. For every PDA P there is a normalized PDA P
N
such that L(P) = L(P
N
).
2. For every normalized PDA P
N
there is a CFG G such that L(P
N
) = L(G).
Normalized PDAs
Denition 12.2. A PDA P = (Q, , , , q
0
, F) is normalized i it satises the following conditions.
Has exactly one accept state, i.e., F = q
a
for some q
a
Q
Empties its stack before accepting, i.e., if q
0
, )
w

P
q
a
, ) then = .
Each transition either pushes one symbol, or it pops one symbol. There are no transitions
that both push and pop, nor transitions that leave the stack unaected.
Normalizing a PDA
Proposition 12.3. For every PDA P, there is a normalized PDA P
N
such that L(P) = L(P
N
)
31
Proof Sketch
We will transform P is a series of steps, each time ensuring that the language does not change.
We will ensure that there is only one accept state
Next, we will ensure that all symbols are popped before accept state is reached.
Finally, we will transform transitions to be either push or pop (not both or neither).
Normalizing a PDA
One accept state
To ensure one accept state, add -transitions (which do not change the stack) from old accept states
to a new accept state.
Formally, given P = (Q, , , , q
0
, F), let P

= (Q

, , ,

, q
0
, F

) where
Q

= Q q
a
, where q
a
, Q
F

= q
a

(q, x, a) = (q, x, a) if q Q F or x ,= or a ,= , and

(q, , ) = (q, , ) (q
a
, ) for
q F, and

(q
a
, x, a) = .
Normalizing a PDA
Emptying stack before acceptance
First push a new symbol $ before starting computation, and from sole accept state, pop all symbols
before popping $ and moving to a new accept state.
i.e., given P = (Q, , , , q
0
, q
a
), let P

= (Q

, ,

, q

0
, F

):

= $, where $ ,
Q

= Q q
i
, q
p
, q
a
where q
i
, q
p
, q
a
, Q
q

0
= q
i
F

= q
a

(q, x, a) = (q, x, a) for q Q q


a
or x ,= or a ,= . For q
a
, we have

(q
a
, , ) =
(q
a
, , ) (q
p
, ). In addition, we have

(q
i
, , ) = (q
0
, $), and

(q
p
, , a) = (q
p
, ) for
a , and

(q
p
, , $) = (q
a
, ). In all other cases

is .
Normalizing a PDA
Only pushes or pops
There are two kinds of transitions that need xing
Transition of the form q
x,ab
q

, where a, b , i.e., those that push and pop in one step


32
Replace this by two steps, where you rst pop a and then push b: q
x,a
q

,b
q

. (q

is a new state involved in only these transitions.)


Transition of the form q
x,
q

, i.e., those that neither push nor pop


Replace this by two steps, where rst a dummy symbol is pushed, and then in the second
step the dummy symbol is popped: q
x,
q

,
q

. (q

is a new state involved in


only these transitions.)
(Formal denition skipped.)
12.2 CFGs for Normalized PDAs
CFGs for Normalized PDAs
Intuitions
Let P = (Q, , , , q
0
, q
a
) be a normalized PDA. w L(P) i q
0
, )
w

P
q
a
, ).
If, for every p, q Q, we can describe L
p,q
= w[ p, )
w

P
q, ) using a CFG, then we are
done because L(P) is nothing but L
q
0
,qa
.
So CFG will have variables A
p,q
such that A
p,q

w i w L
p,q
.
What are the rules for A
p,q
?
Rules for the grammar
Consider w L
p,q
, and a computation corresponding to p, )
w

P
q, ). Since the computation
starts with empty stack, the rst step must be a push, and last step must be a pop, since we end
with empty stack.
Case I: The rst symbol pushed is popped only at the end. So we have p, )
a
r, A)
u

s, A)
b
q, ), with w = aub. And u L
r,s
. Can be captured by rule A
p,q
aA
r,s
b.
Case II: First symbol pushed is popped in the middle of computation (and then stack is
empty). So we have p, )
u
1
r, )
u
2
q, ). Can be captured by rule A
p,q
A
p,r
A
r,q
Formal Construction
Let P = (Q, , , , q
0
, q
a
) be a normalized PDA. Dene G
P
= (V, , R, S) where
V = A
p,q
[ p, q Q,
S = A
q
0
,qa
33
And the rules in R are
For every p Q, A
p,p

For every p, q, r Q, A
p,q
A
p,r
A
r,q
For every p, q, r, s Q, , a, b , if (r, ) (p, a, ) and (q, ) (r, b, )
then A
p,q
aA
r,s
b
12.3 Proof of Correctness
Correctness of Construction
Proposition 12.4. Let P be a normalized PDA and let G
P
be the corresponding CFG. Then
A
p,q

w i p, )
w

P
q, ).
Proof. The two directions are proved as follows
By induction on the number of steps in the derivation.
By induction on the number of steps in the computation.
Proof details in textbook.
Tying all the Ends
Proposition 12.5. Let P be a PDA then L(P) is context-free.
Proof. A normalized PDA P
N
can be constructed such that L(P) = L(P
N
)
A grammar G
P
can be constructed such that L(G
P
) = L(P
N
) = L(P). This is because
S = A
q
0
,qa

w i q
0
, )
w

P
N
q
a
, ) (by previous proposition) i w L(P
N
) = L(P)
34
Part IV
Lectures 15 and 16
35
Chomsky Normal Form
13 Normal Forms for CFG
Normal Forms for Grammars
It is typically easier to work with a context free language if given a CFG in a normal form.
Normal Forms
A grammar is in a normal form if its production rules have a special structure:
Chomsky Normal Form: Productions are of the form A BC or A a
Greibach Normal Form Productions are of the form A a, where V

If is in the language, we allow the rule S . We will require that S does not appear on the
right hand side of any rules.
Today: How to convert any context-free grammar to an equivalent grammar in the Chomsky
Normal Form
We will start with a series of simplications...
14 Three Simplications
14.1 Eliminating -productions
Eliminating -productions
Often would like to ensure that the length of the intermediate strings in a derivation are not
longer than the nal string derived
But a long intermediate string can lead to a short nal string if there are -productions (rules
of the form A ).
Can we rewrite the grammar not to have -productions?
Eliminating -productions
Given a grammar G produce an equivalent grammar G

(i.e., L(G) = L(G

)) such that G

has no
rules of the form A , except possibly S , and S does not appear on the right hand side of
any rule.
Note: If S can appear on the RHS of a rule, say S SS, then when there is the rule S ,
we can again have long intermediate strings yielding short nal strings.
36
Eliminating -productions
Denition: Nullable Variables
A variable A (of grammar G) is nullable if A

.
How do you determine if a variable is nullable?
If A is a production in G then A is nullable
If A B
1
B
2
B
k
is a production and each B
i
is nullable, then A is nullable.
Fixed point algorithm: Propagate the label of nullable until there is no change.
Eliminating -productions given nullables
Intuition: For every variable A in G have a variable A in G

such that A

G
w i A

G
w
and w ,= . For every rule B CAD in G, where A is nullable, add two rules in G

: B CD and
B CAD.
Algorithm
G

has same variables, except for a new start symbol S

.
For each rule A X
1
X
2
X
k
in G, create rules A
1

2

k
where

i
=

X
i
if X
i
is a non-nullable variable/terminal in G
X
i
or if X
i
is nullable in G
and not all
i
are
Add rule S

S. If S nullable in G, add S

also.
Eliminating -productions given nullables
Correctness of the Algorithm
By construction, there are no rules of the form A in G

(except possibly S

), and S

does not appear in the RHS of any rule.


L(G) = L(G

)
L(G

) L(G): For every rule A w in G

, we have A

G
w (by expanding zero or
more nullable variables in w to )
L(G) L(G

): If L(G), then L(G

). If A

G
w
+
, then by induction on the
number of steps in the derivation, A

G
w. Base case: if A w
+
, then A w.
(Proof details skipped.)
37
Eliminating -productions
An Example
Example 14.1. Rules of grammar G be S AB; A AaA[; and B BbB[.
Nullables in G are A, B and S
Rules for grammar G

:
S AB[A[B
A AaA[aA[Aa[a
B BbB[bB[Bb[b
S

S[
14.2 Eliminating Unit Productions
Eliminating Unit Productions
Often would like to ensure that the number of steps in a derivation are not much more than
the length of the string derived
But can have a long chain of derivation steps that make little or no progress, if the grammar
has unit productions (rules of the form A B, where B is a non-terminal).
Note: A a is not a unit production
Can we rewrite the grammar not to have unit-productions?
Eliminating unit-productions
Given a grammar G produce an equivalent grammar G

(i.e., L(G) = L(G

)) such that G

has no
rules of the form A B where B V

.
Eliminating Unit Productions
Unit Productions
Unit productions can play an important role in designing grammars:
While eliminating -productions we added a rule S

S. This is a unit production.


We have used unit productions in building an unambiguous grammar:
I a [ b [ Ia [ Ib
N 0 [ 1 [ N0 [ N1 [ N [ +N
F I [ N [ (E)
T F [ T F
E T [ E +T
38
But as we shall see now, they can be (safely) eliminated
Eliminating Unit Productions
Basic Idea
Introduce new look-ahead productions to replace unit productions: look ahead to see where
the unit production (or a chain of unit productions) leads to and add a rule to directly go there.
Example 14.2. E T F I a[b[Ia[Ib. So introduce new rules E a[b[Ia[Ib
But what if the grammar has cycles of unit productions? For example, A B[a, B C[b and
C A[c. You cannot use the look-ahead approach, because then you will get into an innite
loop.
Eliminating Unit Productions
Basic Idea: Fixed
Algorithm
1. Determine pairs A, B) such that A

u
B, i.e., A derives B using only unit rules. Such pairs
are called unit pairs.
Easy to determine unit pairs: Make a directed graph with vertices = V , and edges =
unit productions. A, B) is a unit pair, if there is a directed path from A to B in the
graph.
2. If A, B) is a unit pair, then add production rules A
1
[
2
[
k
, where B
1
[
2
[ [
k
are all the non-unit production rules of B
3. Remove all unit production rules.
Let G

be the grammar obtained from G using this algorithm. Then L(G

) = L(G)
Eliminating Unit Productions
L(G) = L(G

): Proof
L(G

) L(G): For every rule A w in G

, we have A

G
w (by a sequence of zero or more
unit productions followed by a nonunit production of G)
L(G) L(G

): For w L(G) consider a leftmost derivation S


lm
w in G.
All these derivation steps are possible in G

also, except the ones using the unit produc-


tions of G.
Suppose S

xA
1
xB
2
, where
1
corresponds to a unit rule. Then (in a
leftmost derivation)
2
must correspond to using a rule for B.
So a leftmost derivation of w in G can be broken up into big-steps each consisting of
zero or more unit productions on the leftmost variable, followed by a non-unit production.
For each such big-step there is a single production rule in G

that yields the same


result.
39
14.3 Eliminating Useless Symbols
Eliminating Useless Symbols
Ideally one would like to use a compact grammar, with the fewest possible variables
But a grammar may have useless variables which do not appear in any valid derivation
Can we identify all the useless variables and remove them from the grammar? (Note: there
may still be other redundancies in the grammar.)
Denition
A symbol X V is useless in a grammar G = (V, , S, P) if there is no derivation of the form
S

X

w where w

and , (V )

.
Removing useless symbols (and rules involving them) from a grammar does not change the
language of the grammar
Eliminating Useless Symbols
Denition
A symbol X V is useless in a grammar G = (V, , S, P) if there is no derivation of the form
S

X

w where w

and , (V )

.
i.e., X is useless i either
Type 1: X is not reachable from S (i.e., no , such that S

X), or
Type 2: for all , such that S

X, either , X or cannot yield a string in

. i.e., either
Type 2a: X is not generating (i.e., no w

such that X

w), or
Type 2b: or contains a non-generating symbol
Eliminating Useless Symbols
Algorithm
So, in order to remove useless symbols,
1. First remove all symbols that are not generating (Type 2a)
If X was useless, but reachable and generating (i.e., Type 2b) then X becomes unreach-
able after this step
Type 2b: for all , such that S

X, or contains a non-generating symbol.
Then in the new grammar all such derivations disappear (because some variable in
or is removed).
2. Next remove all unreachable symbols in the new grammar.
Removes Type 1 (originally unreachable) and Type 2b useless symbols now
40
Doesnt remove any useful symbol in either step (Why?)
Only remains to show how to do the two steps in this algorithm
Eliminating Useless Symbols
Generating and Reachable Symbols
The set of generating symbols
If A x, where x

, is a production then A is generating


If A is a production and all variables in are generating, then A is generating.
The set of reachable symbols
S is reachable
If A is reachable and A B is a production, then B is reachable
Fixed point algorithm: Propagate the label (generating or reachable) until no change.
14.4 Putting Together the Three Simplications
The Three Simplications, Together
Given a grammar G, such that L(G) ,= , we can nd a grammar G

such that L(G

) = L(G)
and G

has no -productions (except possibly S ), unit productions, or useless symbols, and S


does not appear in the RHS of any rule.
Proof. Apply the following 3 steps in order:
1. Eliminate -productions
2. Eliminate unit productions
3. Eliminate useless symbols.
Note: Applying the steps in a dierent order may result in a grammar not having all the desired
properties.
15 Chomsky Normal Form
Chomsky Normal Form
Proposition 15.1. For any non-empty context-free language L, there is a grammar G, such that
L(G) = L and each rule in G is of the form
1. A a where a , or
41
2. A BC where neither B nor C is the start symbol, or
3. S where S is the start symbol (i L)
Furthermore, G has no useless symbols.
Chomsky Normal Form
Outline of Normalization
Given G = (V, , S, P), convert to CNF
Let G

= (V

, , S, P

) be the grammar obtained after eliminating -productions, unit pro-


ductions, and useless symbols from G.
If A x is a rule of G

, where [x[ = 0, then A must be S (because G

has no other -
productions). If A x is a rule of G

, where [x[ = 1, then x (because G

has no unit
productions). In either case A x is in a valid form.
All remaining productions are of form A X
1
X
2
X
n
where X
i
V

, n 2 (and
S does not occur in the RHS). We will put these rules in the right form by applying the
following two transformations:
1. Make the RHS consist only of variables
2. Make the RHS be of length 2.
Chomsky Normal Form
Make the RHS consist only of variables
Let A X
1
X
2
X
n
, with X
i
being either a variable or a terminal. We want rules where all
the X
i
are variables.
Example 15.2. Consider A BbCdefG. How do you remove the terminals?
For each a, b, c . . . add variables X
a
, X
b
, X
c
, . . . with productions X
a
a, X
b
b, . . ..
Then replace the production A BbCdefG by A BX
b
CX
d
X
e
X
f
G
For every a
1. Add a new variable X
a
2. In every rule, if a occurs in the RHS, replace it by X
a
3. Add a new rule X
a
a
Chomsky Normal Form
Make the RHS be of length 2
Now all productions are of the form A a or A B
1
B
2
B
n
, where n 2 and each B
i
is
a variable.
How do you eliminate rules of the form A B
1
B
2
. . . B
n
where n > 2?
42
Replace the rule by the following set of rules
A B
1
B
(2,n)
B
(2,n)
B
2
B
(3,n)
B
(3,n)
B
3
B
(4,n)
.
.
.
B
(n1,n)
B
n1
B
n
where B
(i,n)
are new variables.
Chomsky Normal Form
An Example
Example 15.3. Convert: S aA[bB[b, A Baa[ba, B bAAb[ab, into Chomsky Normal Form.
1. Eliminate -productions, unit productions, and useless symbols. This grammar is already in
the right form.
2. Remove terminals from the RHS of long rules. New grammar is: X
a
a, X
b
b, S
X
a
A[X
b
B[b, A BX
a
X
a
[X
b
X
a
, and B X
b
AAX
b
[X
a
X
b
3. Reduce the RHS of rules to be of length at most two. New grammar replaces A BX
a
X
a
by
rules A BX
aa
, X
aa
X
a
X
a
, and B X
b
AAX
b
by rules B X
b
X
AAb
, X
AAb
AX
Ab
,
X
Ab
AX
b
43
Part V
Lecture 17
44
Pumping Lemma for CFLs
16 Introduction
16.1 Non-context-free languages
Non-Context Free Languages
Question
Are there languages that are not context-free? What about L = a
n
b
n
c
n
[ n 0?
Answer
L is not context-free, because
Recognizing if w L requires remembering the number of as seen, bs seen and cs seen
We can remember one of them on the stack (say as) , and compare them to another (say bs)
by popping, but not to both bs and cs
The precise way to capture this intuition is through the pumping lemma
16.2 Pumping Lemma
Pumping Lemma for CFLs
Informal Statement
For all suciently long strings z in a context free language L, it is possible to nd two substrings,
not too far apart, that can be simultaneously pumped to obtain more words in L.
Pumping Lemma for CFLs
Formal Statement
Lemma 16.1. If L is a CFL, then p (pumping length) such that z L, if [z[ p then
u, v, w, x, y such that z = uvwxy
1. [vwx[ p
2. [vx[ > 0
3. i 0. uv
i
wx
i
y L
Two Pumping Lemmas side-by-side
Context-Free Languages
If L is a CFL, then p (pumping length) such that z L, if [z[ p then u, v, w, x, y such that
z = uvwxy
45
1. [vwx[ p
2. [vx[ > 0
3. i 0. uv
i
wx
i
y L
Regular Languages
If L is a regular language, then p (pumping length) such that z L, if [z[ p then u, v, w such
that z = uvw
1. [uv[ p
2. [v[ > 0
3. i 0. uv
i
w L
Pumping Lemma for CFLs
Game View Game between Defender, who claims L satises the pumping condition, and Chal-
lenger, who claims L does not.
Defender Challenger
Pick pumping length p
p

z
Pick z L s.t. [z[ p
Divide z into u, v, w, x, y
s.t. [vwx[ p, and [vx[ > 0
u,v,w,x,y

i
Pick i, s.t. uv
i
wx
i
y , L
Pumping Lemma: If L is CFL, then there is always a winning strategy for the defender (i.e.,
challenger will get stuck).
Pumping Lemma (in contrapositive): If there is a winning strategy for the challenger, then L
is not CFL.
17 Applying the Pumping Lemma
Consequences of Pumping Lemma
If L is context-free then L satises the pumping lemma.
If L satises the pumping lemma that does not mean L is context-free
If L does not satisfy the pumping lemma (i.e., challenger can win the game, no matter what
the defender does) then L is not context-free.
46
17.1 Examples
Example I
Proposition 17.1. L
anbncn
= a
n
b
n
c
n
[ n 0 is not a CFL.
Proof. Suppose L
anbncn
is context-free. Let p be the pumping length.
Consider z = a
p
b
p
c
p
L
anbncn
.
Since [z[ > p, there are u, v, w, x, y such that z = uvwxy, [vwx[ p, [vx[ > 0 and uv
i
wx
i
y L
for all i 0.
Since [vwx[ p, vwx cannot contain all three of the symbols a, b, c, because there are p bs.
So vwx either does not have any as or does not have any bs or does not have any cs. Suppose,
(wlog) vwx does have any as. Then uv
0
wx
0
y = uwy contains more as than either bs or cs.
Hence uwy , L.
Example II
Proposition 17.2. L
a=cb=d
= a
i
b
j
c
i
d
j
[ i, j 0 is not a CFL.
Proof. Suppose L
a=cb=d
is context-free. Let p be the pumping length.
Consider z = a
p
b
p
c
p
c
p
L.
Since [z[ > p, there are u, v, w, x, y such that z = uvwxy, [vwx[ p, [vx[ > 0 and uv
i
wx
i
y L
for all i 0.
Since [vwx[ p, v, x cannot contain both as and cs, nor can it contain both bs and ds.
Further [vx[ > 0. Now uv
0
wx
0
y = uwy , L, because it either contains fewer as than cs, or
fewer cs than as, or fewer bs than ds, or fewer ds than bs.
Example III
Wrong Proof
Proposition 17.3. E = ww [ w 0, 1

is not a CFL.
Proof. Suppose E is context-free. Let p be the pumping length.
Consider z = 0
p
10
p
1 L.
z can be pumped if we make the following division.
0
p
1

00 00

u
0

v
1

w
0
p
1

0

x
00 001

y
47
So is E CFL? No! Does E satisfy the pumping lemma? No!
Example III
Corrected Proof
Proposition 17.4. E = ww [ w 0, 1

is not a CFL.
Proof. Suppose E is context-free. Let p be the pumping length.
Consider z = 0
p
1
p
0
p
1
p
L.
Since [z[ > p, there are u, v, w, x, y such that z = uvwxy, [vwx[ p, [vx[ > 0 and uv
i
wx
i
y L
for all i 0.
vwx must straddle the midpoint of z.
Suppose vwx is only in the rst half. Then in uv
2
wx
2
y the second half starts with 1.
Thus, it is not of the form ww.
Case when vwx is only in the second half. Then in uv
2
wx
2
y the rst half ends in a 0.
Thus, it is not of the form ww.
Example III
Corrected Proof
Proof (contd). Suppose vwx straddles the middle. Then uv
0
wx
0
y must be of the form
0
p
1
i
0
j
1
p
, where either i or j is not p. Thus, uv
0
wx
0
y , E.
18 Proof of the Pumping Lemma
18.1 Informal Idea
Proof of Pumping Lemma
Recall . . .
Lemma 18.1. If L is a CFL, then p (pumping length) such that z L, if [z[ p then
u, v, w, x, y such that z = uvwxy
1. [vwx[ p
2. [vx[ > 0
3. i 0. uv
i
wx
i
y L
48
Proof Idea
Let G be a CFG in Chomsky Normal Form such that L(G) = L. Let z be a very long string in
L (very long made precise later).
S
z
A
A
u v w x
y
Figure 10: Parse Tree for z
Since z L there is a parse tree for z
Since z is very long, the parse tree (which is a binary tree) must be very tall
The longest path in the tree, by pigeon hole principle, must have some variable (say) A repeat.
Let u, v, w, x, y be as shown.
Pumping down and up
S
A
u
w
y
Figure 11: Pumping zero times
49
S
A
A
u v
A
v w x
x
y
Figure 12: Pumping two times
Thus, uv
i
wx
i
y has a parse tree, for any i.
18.2 Formal Proof
Proof of Pumping Lemma
Existence of tall parse trees
Proof. Let G be a grammar in Chomsky Normal Form with k variables such that L(G) = L. Take
p = 2
k
. Consider z L such that [z[ p = 2
k
.
Consider a parse tree for z. Height of this tree is at least k + 1
Parse trees of G are binary trees
Fact: A binary tree of height h has at most 2
h1
leaves
[z[ =Number of leaves in parse tree of z = 2
h1
2
k
. Thus, h k + 1.
Proof of Pumping Lemma
Repeated Variables
Proof (contd). A parse tree for z has a path of length k + 1
A path of length k + 1 has k + 2 vertices, out of which the last one is leaf that is labelled by
a terminal; thus, there are at least k + 1 internal vertices on path.
Thus, there must be two vertices n
1
and n
2
on this path such that n
1
and n
2
have the same
label (say A) and n
1
is an ancestor of n
2
.
Let the yield of tree rooted at n
2
be w, and yield of n
1
be vwx. Yield of the root = z is say
uvwxy.
Proof of Pumping Lemma
Properties of u, v, w, x, y
50
Proof (contd). Height of n
1
can be assumed to be at most k + 1; thus, the yield of n
1
(vwx)
is at most 2
k
= p.
n
1
,= n
2
. Since the grammar has no -productions and no unit-productions, vwx ,= w. i.e.,
[vx[ > 0.
Proof of Pumping Lemma
Pumping the strings
Proof (contd). Based on the parse tree for z, and denitions of u, v, w, x, y, we have
There is a parse tree with yield uAy and root S, obtained by not expanding n
1
. Thus,
S

uAy.
There is a parse tree with yield vAx and root A, obtained from n
1
and not expanding n
2
.
Thus, A

vAx.
There is a parse tree with yield w and root A; this is the tree rooted at n
2
. Thus, A

w.
Putting it together, we have
S

uAy

uvAxy

uvvAxxy



uv
i
Ax
i
y

uv
i
wx
i
y
51
Part VI
Lecture 18
52
19 Closure of CFLs under Regular operations
Union of CFLs
Let L
1
be language recognized by G
1
= (V
1
,
1
, R
1
, S
1
) and L
2
the language recognized by
G
2
= (V
2
,
2
, R
2
, S
2
)
Is L
1
L
2
a context free language? Yes.
Just add the rule S S
1
[S
2
But make sure that V
1
V
2
= (by renaming some variables).
Closure of CFLs under Union
G = (V, , R, S) such that L(G) = L(G
1
) L(G
2
):
V = V
1
V
2
S (the three sets are disjoint)
=
1

2
R = R
1
R
2
S S
1
[S
2

Concatenation, Kleene Closure


Proposition 19.1. CFLs are closed under concatenation and Kleene closure
Proof. Let L
1
be language generated by G
1
= (V
1
,
1
, R
1
, S
1
) and L
2
the language generated by
G
2
= (V
2
,
2
, R
2
, S
2
)
Concatenation: L
1
L
2
generated by a grammar with an additional rule S S
1
S
2
Kleene Closure: L

1
generated by a grammar with an additional rule S S
1
S[
As before, ensure that V
1
V
2
= . S is a new start symbol.
(Exercise: Complete the Proof!)
Intersection
Let L
1
and L
2
be context free languages. L
1
L
2
is not necessarily context free!
Proposition 19.2. CFLs are not closed under intersection
Proof. L
1
= a
i
b
i
c
j
[ i, j 0 is a CFL
Generated by a grammar with rules S XY ; X aXb[; Y cY [.
L
2
= a
i
b
j
c
j
[ i, j 0 is a CFL.
Generated by a grammar with rules S XY ; X aX[; Y bY c[.
But L
1
L
2
= a
n
b
n
c
n
[ n 0 is not a CFL.
53
Intersection with Regular Languages
Proposition 19.3. If L is a CFL and R is a regular language then L R is a CFL.
Proof. Let P be the PDA that accepts L, and let M be the DFA that accepts R. A new PDA
P

will simulate P and M simultaneously on the same input and accept if both accept. Then P

accepts L R.
The stack of P

is the stack of P
The state of P

at any time is the pair (state of P, state of M)


These determine the transition function of P

The nal states of P

are those in which both the state of P and state of M are accepting.
More formally, let M = (Q
1
, ,
1
, q
1
, F
1
) be a DFA such that L(M) = R, and P = (Q
2
, , ,
2
, q
2
, F
2
)
be a PDA such that L(P) = L. Then consider P

= (Q, , , , q
0
, F) such that
Q = Q
1
Q
2
q
0
= (q
1
, q
2
)
F = F
1
F
2
((p, q), x, a) = ((p

, q

), b) [ p

=
1
(p, x) and (q

, b)
2
(q, x, a).
One can show by induction on the number of computation steps, that for any w

q
0
, )
w

P
(p, q), ) i q
1
w

M
p and q
2
, )
w

P
q, )
The proof of this statement is left as an exercise. Now as a consequence, we have w L(P

)
i q
0
, )
w

P
(p, q), ) such that (p, q) F (by denition of PDA acceptance) i q
0
, )
w

(p, q), ) such that p F


1
and q F
2
(by denition of F) i q
1
w

M
p and q
2
, )
w

P
q, ) and
p F
1
and q F
2
(by the statement to be proved as exercise) i w L(M) and w L(P) (by
denition of DFA acceptance and PDA acceptance).
Why does this construction not work for intersection of two CFLs?
Complementation
Let L be a context free language. Is L context free? No!
Proof 1. Suppose CFLs were closed under complementation. Then for any two CFLs L
1
, L
2
, we
have
L
1
and L
2
are CFL. Then, since CFLs closed under union, L
1
L
2
is CFL. Then, again by
hypothesis, L
1
L
2
is CFL.
i.e., L
1
L
2
is a CFL
54
i.e., CFLs are closed under intersection. Contradiction!
Proof 2. L = x [ x not of the form ww is a CFL.
L generated by a grammar with rules X a[b, A a[XAX, B b[XBX, S A[B[AB[BA
But L = ww [ w a, b

is not a CFL! (Why?)


Set Dierence
Proposition 19.4. If L
1
is a CFL and L
2
is a CFL then L
1
L
2
is not necessarily a CFL
Proof. Because CFLs not closed under complementation, and complementation is a special case of
set dierence. (How?)
Proposition 19.5. If L is a CFL and R is a regular language then L R is a CFL
Proof. L R = L R
20 Closure of CFLs under Homomorphism and Inverse Homo-
morphism
Homomorphism
Proposition 20.1. Context free languages are closed under homomorphisms.
Proof. Let G = (V, , R, S) be the grammar generating L, and let h :

be a homomorphism.
A grammar G

= (V

, , R

, S

) for generating h(L):


Include all variables from G (i.e., V

V ), and let S

= S
Treat terminals in G as variables. i.e., for every a
Add a new variable X
a
to V

In each rule of G, if a appears in the RHS, replace it by X


a
For each X
a
, add the rule X
a
h(a)
G

generates h(L). (Exercise!)


Homomorphism
55
Example 20.2. Let G have the rules S 0S0[1S1[.
Consider the homorphism h : 0, 1

a, b

given by h(0) = aba and h(1) = bb.


Rules of G

s.t. L(G

) = h(L(G)):
S X
0
SX
0
[X
1
SX
1
[
X
0
aba
X
1
bb
Inverse Homomorphisms
Recall: For a homomorphism h, h
1
(L) = w [ h(w) L
Proposition 20.3. If L is a CFL then h
1
(L) is a CFL
Proof Idea
For regular language L: the DFA for h
1
(L) on reading a symbol a, simulated the DFA for L on
h(a). Can we do the same with PDAs?
Key idea: store h(a) in a buer and process symbols from h(a) one at a time (according
to the transition function of the original PDA), and the next input symbol is processed only
after the buer has been emptied.
Where to store this buer? In the state of the new PDA!
Proof. Let P = (Q, , , , q
0
, F) be a PDA such that L(P) = L. Let h :

be a homomor-
phism such that n = max
a
[h(a)[, i.e., every symbol of is mapped to a string under h of length
at most n. Consider the PDA P

= (Q

, , ,

, q

0
, F

) where
Q

= Q
n
, where
n
is the collection of all strings of length at most n over .
q

0
= (q
0
, )
F

= F

is given by

((q, v), x, a) =

((q, h(x)), ) if v = a =
((p, u), b) [ (p, b) (q, y, a) if v = yu, x = , and y
and

() = in all other cases.


We can show by induction that for every w

0
, )
w

P
(q, v), ) i q
0
, )
w

P
q, )
where h(w) = w

v. Again this induction proof is left as an exercise. Now, w L(P

) i q

0
, )
w

(q, ), ) where q F (by denition of PDA acceptance and F

) i q
0
, )
h(w)

P
q, ) (by exercise)
i h(w) L(P) (by denition of PDA acceptance). Thus, L(P

) = h
1
(L(P)) = h
1
(L).
56
Part VII
Lecture 19
57
Decision Problems for CFLs and the Chomsky
Hierarchy
21 Decision Problems for CFLs
21.1 Emptiness of CFLs
Emptiness Problem
Given a CFG G with start symbol S, is L(G) empty?
Solution: Check if the start symbol S is generating. How long does that take?
Determining generating symbols
Algorithm
Gen =
for every rule A x where x

Gen = Gen A
repeat
for every rule A
if all variables in are generating then
Gen = Gen A
until Gen does not change
Both for-loops take O(n) time where n = [G[.
Each iteration of repeat-until loop discovers a new variable. So number of iterations is O(n).
And total is O(n
2
).
21.2 Membership Problem
Membership Problem
Given a CFG G = (V, , R, S) in Chomsky Normal Form, and a string w

, is w L(G)?
Central question in parsing.
58
21.2.1 Simple Solution
Simple Solution
Let [w[ = n. Since G is in Chomsky Normal Form, w has a parse tree of size 2n 1 i
w L(G)
Construct all possible parse (binary) trees and check if any of them is a valid parse tree for w
Number of parse trees of size 2n 1 is k
2n1
where k is the number of variables in G. So
algorithm is exponential in n!
We will see an algorithm that runs in O(n
3
) time (the constant will depend on k).
21.2.2 CYK Algorithm
First Ideas
Notation
Suppose w = w
1
w
2
w
n
, where w
i
. Let w
i,j
denote the substring of w starting at position i
of length j. Thus, w
i,j
= w
i
w
i+1
w
i+j1
Main Idea
For every A V , and every i n, j n + 1 i, we will determine if A

w
i,j
.
Now, w L(G) i S

w
1,n
= w; thus, we will solve the membership problem.
How do we determine if A

w
i,j
for every A, i, j?
Base Case
Substrings of length 1
Observation
For any A, i, A

w
i,1
i A w
i,1
is a rule.
Since G is in Chomsky Normal Form, G does not have any -rules, nor any unit rules.
Thus, for each A and i, one can determine if A

w
i,1
.
Inductive Step
Longer substrings
59
A
B C
w
i,k
w
i+k,jk
Suppose for every variable X and every w
i,
( < j) we have determined if X

w
i,
A

w
i,j
i there are variables B and C and some k < j such that A BC is a rule, and
B

w
i,k
and C

w
i+k,jk
Since k and j k are both less than j, we can inductively determine if A

w
i,j
.
Cocke-Younger-Kasami (CYK) Algorithm
Algorithm maintains X
i,j
= A[ A

w
i,j
.
Initialize: X
i,1
= A[ A w
i,1

for j = 2 to n do
for i = 1 to n j + 1 do
X
i,j
=
for k = 1 to j 1 do
X
i,j
= X
i,j
A[ A BC, B X
i,k
, C X
i+k,jk

Correctness: After each iteration of the outermost loop, X


i,j
contains exactly the set of variables
A that can derive w
i,j
, for each i.Time = O(n
3
).
Example
Example 21.1. Consider grammar S AB [ BC, A BA [ a, B CC [ b, C AB [ a Let
w = baaba. The sets X
i,j
= A[ A

w
i,j
:
j/i 1 2 3 4 5
5 S, A, C
4 S, A, C
3 B B
2 S, A B S, C S, A
1 B A, C A, C B A, C
b a a b a
60
21.3 Other Decision Problems
More Decision Problems
Given a CFGs G
1
and G
2
Is L(G
1
) =

?
Is L(G
1
) L(G
2
) = ?
Is L(G
1
) = L(G
2
)?
Is G
1
ambiguous?
Is L(G
1
) inherently ambiguous?
There are no algorithms to solve any of these problems. We will see some of these proofs in the
next few weeks.
22 Chomsky Hierarchy
Grammars for each task
Figure 13: Noam Chomsky
Dierent types of rules, allow one to describe dierent aspects of natural language
These grammars form a hierarchy
Grammars in General
All grammars we consider will be of the form G = (V, , R, S)
V is a nite set of variables
is a nite set of terminals
R is a nite set of rules
61
S is the start symbol
The dierent grammars will be determined by the form of the rules in R.
22.1 Regular Languages
Type 3 Grammars
The rules in a type 3 grammar are of the form
A aB or A a
where A, B V and a .
We say A
G
i A R. L(G) = w

[ S

G
w
22.1.1 Type 3 Grammars and Regularity
Type 3 Grammars and Regularity
Proposition 22.1. If G is Type 3 grammar then L(G) is regular. Conversely, if L is regular then
there is a Type 3 grammar G such that L = L(G).
Proof. Let G = (V, , R, S) be a type 3 grammar. Consider the NFA M = (Q, , , q
0
, F) where
Q = V q
F
, where q
F
, V
q
0
= S
F = q
F

(A, a) = B [ if A aB R q
F
[ if A a R for A V . And (q
F
, a) = for all a.
L(M) = L(G) as A V , w

, A

G
w i A
w

M
q
F
.
Type 3 Grammars and Regularity
NFA to Grammars
Proof (contd). Let M = (Q, , , q
0
, F) be a NFA recognizing L. Consider G = (V, , R, S) where
V = Q
S = q
0
q
1
aq
2
R i q
2
(q
1
, a) and q R i q F.
We can show, for any q, q

Q and w

, q
w

M
q

i q

G
wq

. Thus, L(M) = L(G).


62
22.2 Context-free Languages
Type 2 Grammars
The rules in a type 2 grammar are of the form
A
where A V and ( V )

.
We say A
G
i A R. L(G) = w

[ S

G
w
By denition, Type 2 grammars describe exactly the class of context-free languages.
22.3 Beyond Context-Free Languages
22.3.1 Type 0 Grammars
Type 0 Grammars
The rules in a type 0 grammar are of the form

where , ( V )

.
We say
1

2

G

1

2
i R. L(G) = w

[ S

G
w
Example of Type 0 Grammar
Example 22.2. Consider the grammar G with = a with
S $Ca# [ a [ Ca aaC $D $C
C# D# [ E aD Da aE Ea
$E
The following are derivations in this grammar
S $Ca# $aaC# $aaE $aEa $Eaa aa
S $Ca# $aaC# $aaD# $aDa# $Daa# $Caa#
$aaCa# $aaaaC# $aaaaE $aaaEa $aaEaa
$aEaaa $Eaaaa aaaa
L(G) = a
i
[ i is a power of 2
Expressive Power of Type 0 Grammars
Recall that any decision problem can be thought of as a formal language L, where x L i the
answer on input x is yes.
63
Proposition 22.3. A decision problem L can be solved on computers i L can be described by
a Type 0 grammar.
Proof. Need to develop some theory, that we will see in the next few weeks.
22.3.2 Type 1 Grammars
Type 1 Grammars
The rules in a type 1 grammar are of the form

where , ( V )

and [[ [[.
We say
1

2

G

1

2
i R. L(G) = w

[ S

G
w
Normal Form for Type 1 Grammars
We can dene a normal form for Type 1 grammars where all rules are of the form

1
A
2

1

2
Thus, the rules in Type 1, can be seen as rules of a CFG where a variable A is replaced by a
string in one step, with the only dierence being that rule can be applied only in the context

2
.
Thus, languages described by Type 1 grammars are called context-sensitive languages.
22.3.3 Hierarchy
Chomsky Hierarchy
Theorem 22.4. Type 0, Type 1, Type 2, and Type 3 grammars dene a strict hierarchy of formal
languages.
Proof. Clearly a Type 3 grammar is a special Type 2 grammar, a Type 2 grammar is a special
Type 1 grammar, and a Type 1 grammar is special Type 0 grammar.
Moreover, there is a language that has a Type 2 grammar but no Type 3 grammar (L =
0
n
1
n
[ n 0), a language that has a Type 1 grammar but no Type 2 grammar (L = a
n
b
n
c
n
[ n
0), and a language with a Type 0 grammar but no Type 1 grammar.
64

You might also like