Handout 6: The Context-Free Languages: C T.A. Henzinger, G. TH Eoduloz

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Informatique Théorique Spring 2011

Handout 6: The Context-free Languages

6.1 Push-down automata. We have seen that the language {0n 1n | n ≥ 0} is not regular, because finite
automata do not have the ability to count. We now equip finite automata with an unbounded data struc-
ture —a stack— which will prove useful for counting. A push-down automaton M = (Q, Σ, Γ, δ, q0 , F )
consists of
Q . . . a finite set of states,
Σ . . . a finite set of input symbols,
Γ . . . a finite set of stack symbols,
δ: Q × Σε × Γε → P(Q × Γε ) . . . a transition relation,
q0 ∈ Q . . . an initial state,
F ⊆ Q . . . a set of final states.
A run of M is a sequence
a a an−1
(p0 , s0 ) →0 (p1 , s1 ) →1 ··· → (pn , sn )
with p0 , . . . , pn ∈ Q and s0 , . . . , sn ∈ Γ∗ and a0 , . . . , an−1 ∈ Σε such that
(1) p0 = q0 and s0 = ε (initially the stack is empty), and
(2) for all i ∈ [0..n − 1], we have (pi+1 , γi+1 ) ∈ δ(pi , ai , γi ) and si = γi s0 and si+1 = γi+1 s0 for some
s0 ∈ Γ∗ (the letter γ, if different from ε, represents the top stack symbol, and s0 represents the rest
of the stack).
The run accepts the input word a0 . . . an−1 if
(3) pn ∈ F .
The language of M is
L(M ) = {w ∈ Σ∗ | w is accepted by some run of M }.
Note that, according to our definition, PDAs are nondeterministic and may have ε-transitions.

6.2 Examples. The following four languages are not regular but accepted by PDAs:

L1 = {0n 1n | n ≥ 0}
0,ε→# 1,#→ε

 
/ ε,ε→$
/ ε,ε→ε
/ ε,$→ε
/

L2 = {wwR | w ∈ {0, 1}∗ }


0, ε → 0 0, 0 → ε
1, ε → 1 1, 1 → ε

 
/ ε,ε→$
/ ε,ε→ε
/ ε,$→ε
/

L3 = {w ∈ { ‘(’ , ‘)’ }∗ | w is a balanced string of parentheses}


(, ε → #
), # → ε


/ ε,ε→$
/ ε,$→ε
/

1
c T.A. Henzinger, G. Théoduloz
L4 = {w ∈ {0, 1}∗ | #(w, 0) = #(w, 1)}
(where #(w, σ) denotes the number of occurrences of the letter σ in the word w)
0, ε → 0
1, ε → 1
0, 1 → ε
1, 0 → ε


/ ε,ε→$
/ ε,$→ε
/

6.3 Context-free grammars. A context-free grammar G = (V, Σ, R, S) consists of


V . . . a finite set of nonterminals (variables),
Σ . . . a finite set of terminals (constants),
R . . . a finite set of rules (productions) of the form A → w, where A ∈ V and w ∈ (V ∪ Σ)∗ ,
S ∈ V . . . a start symbol.
If A → w, then xAy ⇒ xwy for all x, y ∈ (V ∪ Σ)∗ . A derivation of G is a sequence S ⇒ w1 ⇒ · · · ⇒ wn
with w0 , . . . , wn ∈ (V ∪ Σ)∗ . The derivation generates the word wn . The language of G is
L(G) = {w ∈ Σ∗ | w is generated by some derivation of G}.
We shall see that the language of every PDA can be defined by a CFG, and that the language of every
CFG can be defined by a PDA. The languages of PDAs and CFGs are called context-free.

6.4 Examples. Recall the languages from Example 6.2:


L1 = {0n 1n | n ≥ 0}
S → ε | 0S1
L2 = {wwR | w ∈ {0, 1}∗ }
S → ε | 0S0 | 1S1
L3 = {w ∈ { ‘(’ , ‘)’ }∗ | w is a balanced string of parentheses}
S → ε | ( S ) | SS
L4 = {w ∈ {0, 1}∗ | #(w, 0) = #(w, 1)}
S → ε | 0A | 1B
A → 1S | 0AA
B → 0S | 1BB

6.5 Top-down parsing. Let G = (V, Σ, R, S) be a context-free grammar. Let TG be the PDA with the
following state-transition diagram:

/ qstart

ε,ε→S$


qloop
ε, A → w for each rule A → w in R
` σ, σ → ε for each terminal σ ∈ Σ

ε,$→ε


qend

2
c T.A. Henzinger, G. Théoduloz
If w contains more than one letter, then an edge labeled with ε, A → w is really shorthand for several
consecutive edges, each pushing one letter of w. Similarly, the edge labeled with ε, ε → S$ is shorthand
for two consecutive edges, one labeled with ε, ε → $ and the second one with ε, ε → S. It can be seen
that each run of BG corresponds to a derivation of G, and vice versa; that is, L(TG ) = L(G).

6.6 Bottom-up parsing. Let BG be the PDA with the following state-transition diagram:

/ qstart

ε,ε→$


qloop ε, wR → A for each rule A → w in R
` σ, ε → σ for each terminal σ ∈ Σ

ε,S$→ε


qend

Again, each run of TG corresponds to a derivation of G, and vice versa; that is, L(BG ) = L(G).

6.7 Ambiguity. On the same input word, the runs of the top-down and bottom-up parsers may
correspond to different derivations. The top-down parser produces a left-most derivation, because at
each intermediate stage during the derivation, always the left-most nonterminal is replaced by the right-
hand side of a production. By contrast, the bottom-up parser produces a right-most derivation. An input
word may even have several different left-most (or right-most) derivations; in this case, the grammar is
called ambiguous. A simple example of an ambiguous grammar is S → S + S | 1: the two left-most
derivations
S ⇒ S+S ⇒ 1+S ⇒ 1+S+S ⇒ 1+1+S ⇒ 1+1+1
S ⇒ S+S ⇒ S+S+S ⇒ 1+S+S ⇒ 1+1+S ⇒ 1+1+1
derive the same word. The difference between the two derivations can also be seen in the corresponding
parse trees:
S S

S S S S

S S S S

1 + 1 + 1 1 + 1 + 1

Some context-free languages are inherently ambiguous, that is, they cannot be generated by unambiguous
CFGs. An example of such a language is {0i 1j 2k | i = j or j = k}.

6.8 From push-down automata to context-free grammars. We have seen how given a CFG, we
can construct an equivalent PDA. The converse is also true: given a PDA M = (Q, Σ, Γ, δ, q0 , F ), we can
construct a CFG G = (V, Σ, R, S) such that L(G) = L(M ). We first modify M so that it satisfies the
following three conditions:
– M has a single accept state; i.e., F = {qA }.
– M empties its stack before accepting.
– Each transition either pops or pushes a stack symbol, but does not do both.
Let V = {Ap,q | p, q ∈ Q} and S = Aq0 ,qA . The intention is to have each nonterminal Ap,q derive a string
w of terminals iff M , when started in p with the empty stack, on reading input w can end up in q, again
with the stack empty.

3
c T.A. Henzinger, G. Théoduloz
This is achieved by adding the following rules to R:
– For each p ∈ Q, Ap,p → ε.
– For each p, q, r ∈ Q, Ap,r → Ap,q Aq,r .
– For each p, p0 , q 0 , q ∈ Q and a, b ∈ Σε and γ ∈ Γ, if (p0 , γ) ∈ δ(p, a, ε) and (q, ε) ∈ δ(q 0 , b, γ),
then Ap,q → aAp0 ,q0 b.
The construction is proved correct in Claims 2.30 and 2.31 (pages 123–124) of Sipser. Note that the size
of G is O(|M |3 ).

4
c T.A. Henzinger, G. Théoduloz

You might also like