3. Context_Free_Language

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Context-Free Languages

Text Book: Peter Linz


Prepared by : Dr. Ravi B

November 17, 2023

1 Introduction
For all Regular languages there exists a dfa that accepts it. However, dfa cannot be used to describe
all types of languages. For example, non-regular languages such as L = {an bn : n ≥ 0} cannot
represented using a dfa. This is the limitation of dfa. Such non-regular languages can easily be
described using Context-Free Grammars(CFG).

2 Context-Free Grammars

2.1 Formal definition

A grammar G = (V, T, S, P ) is said to be context-free if all productions in P have the form


A→x

where A ∈ V and x ∈ (V ∪ T ) .
A language L is said to be context-free if and only if there is a context-free grammar G such that
L = L(G).
Note the following:

1. Every regular language is context-free. So the family of regular language is a proper subset of
the family of context-free languages.

2. The production rules specify how the grammar transforms one string into another, and through

1
this they define a language associated with the grammar. A production can be used whenever
it is applicable, and it can be applied as often as desired.

3. If G = (V, T, S, P ) is a CFG and w ∈ L(G) , then


S ⇒ w1 ⇒ w2 ⇒ ... ⇒ wn ⇒ w
is the derivation of w , where S is the start symbol, w1 , w2 , . . . , wn are called sentential forms,
and w is the string (called sentence) derived from G

4. Context-free grammars derive their name from the fact that the substitution of the variable on
the left of a production can be made any time such a variable appears in a sentential form. It
does not depend on the symbols in the rest of the sentential form (the context). This feature
is the consequence of allowing only a single variable on the left side of the production.

3 Examples of Context-Free Languages


Example-1
We learnt that L = {an bn : n ≥ 0} is a context-free language and the following CFG describes the
language
S → aSb|λ
Example-2
L = {wwR : w ∈ {a, b}∗ } represents even length palindrome string consisting a’s and b’s. Clearly it
is a context-free language and the following CFG describes the language
S → aSa
S → bSb
S→λ

4 Problems
1. Give a CFG for each of the following

(a) L = {an bm : n ̸= m}
S → aSb|A|B

2
A → aA|a
B → bB|b

(b) L = {an b2n : n ≥ 0}


S → aSbb|λ

(c) L = {wcwR : w ∈ {a, b}∗ }

(d) L = {wwR : w ∈ {a, b}∗ }


S → aSa|bSb|λ
Note: S → aSa|bSb|c|λ represents both odd and even length palindrome strings

(e) L = {(ab)n : n ≥ 0} (Exercise)


S → abS|λ

(f) Let L1 = {an bn : n ≥ 0} and L2 = {an b2n : n ≥ 0}. Find grammars for L1 .L2 , L1 ∪ L2
and L∗1
Grammar for L1 .L2
L1 .L2 = {an bn am b2m : n ≥ 0, m ≥ 0}
S → AB
A → aAb|λ
B → aBbb|λ
Grammar for L1 ∪ L2
L1 ∪ L2 = {an bn : n ≥ 0} ∪ {am b2m : m ≥ 0}
S → A|B
A → aAb|λ
B → aBbb|λ
Grammar for L∗1
L∗1 = {an bn : n ≥ 0}∗
S → AS|λ
A → aAb|λ

(g) L = {an bm : n ̸= m − 1}
Consider the following series
..., an−2 bn , an−1 bn , an bn , an+1 bn , an+2 bn , ...
The only case that makes n = m when we subtract one b is an−1 bn . Therefore, strings of

3
the form an−1 bn are not in the language. All other cases are accepted. Thus, following
languages are accepted
L1 = {an bm : n ≥ m} and
L2 = {an bm : m ≥ n + 2}
We write the grammar for L1 and L2 as shown below
S → A|B
A → aAb|aA|λ
B → aBb|Bb|bb

(h) L = {an bn−3 : n ̸= 3}


The number of a’s is 3 more than number of b’s
S → aaaA
A → aAb|λ

(i) L = {an bm : n ≤ m + 3} (Exercise)

(j) L = {an bm : 2n ≤ m ≤ 3n}


When n = 0, m = 0
When n = 1, m = 2, 3
When n = 2, m = 4, 5, 6
When n = 3, m = 6, 7, 8, 9 so on
Therefore the grammar is
S → aSbb|aSbbb|λ

(k) L = {an bm : 0 ≤ n ≤ m ≤ 3n} (Exercise)

(l) L = {an bm ck : n = m or m ≤ k} Assume n ≥ 0, m ≥ 0, k ≥ 0


The number of a’s is equal to number of b’s, or number of b’s is ≤ the number of c’s.
Therefore, the grammar is
S → XC|AY
X → aXb|λ
C → cC|λ
A → aA|λ
Y → bY c|cY |λ

(m) L = {an bm ck : n = m or m ̸= k} Assume n ≥ 0, m ≥ 0, k ≥ 0


The grammar is

4
S → XC|AY |AZ
X → aXb|λ
C → cC|λ
A → aA|λ
Y → bY c|cY |c
Z → bZc|bZ|b

(n) L = {an bm ck : k = n + 2m} Assume n ≥ 0, m ≥ 0 (Exercise)


The grammar is
S → aSc|A|λ
A → bAcc|λ

(o) L = {an wwR bn : n ≥ 1} (Exercise)

(p) Give a grammar for mathematical expressions


The grammar is
E → E + E|E − E|E ∗ E|E/E|(E)|id

(q) Give a grammar that describes a regular expression (Exercise)

2. Show that the following languages are Context-Free

(a) L = {uvwv R : u, v, w ∈ {a, b}+ , |u| = |w| = 2} (Exercise)

(b) L = {w : na (w) mod 3 = 0}


S → bS|aA|λ
A → bA|aB
B → bB|aS

(c) L = {an bm ck : k = |n − m|} (Exercise)

(d) L = {an bm ck : k ̸= n + m} (Exercise)


k ̸= n + m ⇒ k < n + m or k > n + m
For k < n + m, generate a’, b’s and c’s such that k = n + m and then add a’s or b’s.
For k > n + m, generate a’, b’s and c’s such that k = n + m and then more c’s.
Therfore we need to handle four cases:
Add both a’s and b’s when k = n + m
Add only a’s when k = n + m

5
Add only b’s when k = n + m
Add c’s when k = n + m
The grammar is
S → ABS1 |AS1 |BS1 |S1 C
S1 → aS1 c|D
D → bDc|λ
A → aA|a
B → bB|b
C → cC|c

(e) L = {an bm : 0 ≤ n ≤ m ≤ 2n} (Exercise)

(f) If L = {an bn : n ≥ 0} then show that L2 is also context-free.


(Exercise) L2 = {an bn am bm : m, n ≥ 0}
S → AA
A → aAb|λ

(g) L = {w ∈ {a}∗ : |w| mod 3 > 0}


S → a|aa|aaaS

(h) L = {w ∈ {a}∗ : |w| mod 3 ̸= |w| mod 2}


|w| mod 3 = 0, 1, 2
|w| mod 2 = 0, 1
So we need the following combinations
(0, 1), (1, 0), (2, 0), (2, 1)
S → aa|aaa|aaaa|aaaaa|aaaaaaS

(i) L = {w ∈ {a, b}∗ : na (w) = nb (w)}


Note that λ can be generated with S → λ
Note that we have the following cases
case 1:
Strings of the form aw1 b where w1 ∈ L This can be generated using S → aSb
case 2:
Strings of the form bw1 a where w1 ∈ L This can be generated using S → bSa
case 3:

6
Strings that begin and ends with the same symbol
Note that any string that has equal number of a’s and b’s and that begins and ends
with same symbol can be considered as a concatenation of two string w1 and w2 , where
w1 , w2 ∈ L, such that each string begins and ends with different sumbol.
Example :
w = aabbba is the concatenation of w1 = aabb and w2 = ba
w = abbaba is the concatenation of w1 = ab and w2 = baba
w = ababba is the concatenation of w1 = abab and w2 = ba
w = babaab is the concatenation of w1 = baba and w2 = ab
Therefore,using case 1 and case 2 each of the strings (w = w1 w2 )can be generated using
S → SS
Therefore , required grammar to handle all the cases is
S → aSb|bSa|SS|λ

5 Leftmost and Rightmost Derivations


In a grammar that is not linear, a derivation may involve sentential forms with more than one
variable. In such cases, we have a choice in the order in which variables are replaced.

5.1 Definition

A derivation is said to be leftmost if in each step the leftmost variable in the sentential form is
replaced. If in each step the rightmost variable is replaced, we call the derivation rightmost
Example - 1
Consider the grammar with productions
S → aAB
A → bBb
B → A|λ
Consider the string w = abbbb. The leftmost derivation for w is
S ⇒ aAB ⇒ abBbB ⇒ abAbB ⇒ abbBbbB ⇒ abbbbB ⇒ abbbb
and the rightmost derivation for w is
S ⇒ aAB ⇒ aA ⇒ abBb ⇒ abAb ⇒ abbBbb ⇒ abbbb

7
6 Derivation Trees
A second way of showing derivations, independent of the order in which productions are used, is by
a derivation or parse tree. A derivation tree is an ordered tree in which nodes are labeled with the
left sides of productions and in which the children of a node represent its corresponding right sides.
A derivation tree has the following properties
Example-1:
Consider the following grammar with productions
S → aAB
A → bBb
B → A|λ
The tree in Figure is a derivation of w = abbbb

Figure 1: Derivation tree for w = abbbb

Example-2:
Consider the following grammar with productions
E → E + E|E ∗ E|(E)|a|b|c
Show leftmost and rightmost derivations for w = a + b ∗ c. Show the derivation tree.

8
leftmost derivation

E ⇒E+E

⇒ a + E(E → a)

⇒ a + E ∗ E(E → E ∗ E) (1)

⇒ a + b ∗ E(E → b)

⇒ a + b ∗ c(E → c)

rightmost derivation

E ⇒E+E

⇒ E + E ∗ E(E → E ∗ E)

⇒ E + E ∗ c(E → c) (2)

⇒ E + b ∗ c(E → b)

⇒ a + b ∗ c(E → a)

derivation tree

Figure 2: Derivation tree for w = abbbb

9
Problems

1. Consider the following grammar.


S → AB|λ
A → aB
B → Sb
Give leftmost and rightmost derivation for w = aabbbb. Show the derivation tree. (Exercise)

2. Consider the following grammar.


S → abB
A → aaBb|λ
B → bbAa
Show leftmost and rightmost derivation for w = abbbaabbaba. Show the derivation tree. (Ex-
ercise)

7 Ambiguous Grammars
A context-free grammar G is said to be ambiguous if there exists some w ∈ L(G) that has at least
two distinct derivation trees. Alternatively, ambiguity implies the existence of two or more leftmost
or rightmost derivations.
Example - 1
The grammar with productions,
S → aSb|SS|λ
ambiguous because the sentence w = aabb the two derivation trees corresponding to two leftmost
derivations as shown in the following figure.

10
Figure 3: Derivation tree for w = abbbb

7.1 Removing Ambiguity

Ambiguity is a common feature of natural languages, where it is tolerated and dealt with in a
variety of ways. In programming languages, where there should be only one interpretation of each
statement, ambiguity must be removed when possible. Often we can achieve this by rewriting the
grammar in an equivalent, unambiguous form.
As an example, consider the grammar that generates arithmetic expression ,
E → E + E|E − E|E ∗ E|E/E|(E)|a|b|c
This grammar is ambiguous because the string w = a + b ∗ c has two leftmost (or rightmost)
derivations as show below
First leftmost derivation

E ⇒E+E

⇒ a + E(E → a)

⇒ a + E ∗ E(E → E ∗ E) (3)

⇒ a + b ∗ E(E → b)

⇒ a + b ∗ c(E → c)

11
Second leftmost derivation

E ⇒E∗E

⇒ E + E ∗ E(E → E + E)

⇒ a + E ∗ E(E → a) (4)

⇒ a + b ∗ E(E → b)

⇒ a + b ∗ c(E → c)

To remove ambiguity from the grammar we need to understand problems associated with the
grammar. The grammar is ambiguous because it has two problems associated with it.

1. It does not specify operator precedence

2. It does not specify operator associativity.

To understand the first problem let us consider the two derivation tree for w = a + b ∗ c

Figure 4: Derivation trees for w = a + b ∗ c

First tree evaluates a + b first and then multiplies with c, whereas second tree evaluates b ∗ c first
and adds the result with a. In other words, + has higher precedence than ∗ in the first tree while ∗
has higher precedence than + in the second tree Thus, there is an ambiguity in assigning precedence
to the operators.

12
Associativity matters when an expression involves a series of operators with same precedence. In
such cases, operators can be left associative or right associative. When operators are left associative,
evaluation order will be from left to right and when they are right asscoiative evaluation order will
be from right to left. For example, 7−4+2 = 5 (when + and − are left associative) and 7−4+2 = 1
(when + and − are right associative). To understand this problem with our grammar , consider the
two derivation trees for the expression w = a − b + c.

Figure 5: Derivation trees for w = a − b + c

First tree evaluates a − b first and then adds with c, whereas second tree evaluates b + c first and
then subtracts from a. In other words, − and + are left associative in the first tree while they are
right associative in the second tree Thus, there is an ambiguity in operator associativity.

One way to resolve the ambiguity in operator precedence is to associate precedence rules with
the operators + and ∗ . Since ∗ normally has higher precedence than +, we would take so that b ∗ c
is a subexpression to be evaluated before performing the addition by introducing new variables and
rewriting the grammar so that only one leftmost derivation is possible. To resolve the ambiguity
in operator associativity, we make each production left recursive so that operators become left
associative so that only one leftmost derivation is possible.
(A production of the form A → Aα is known as left recursive production and a production of the
form A → αA is known as left recursive production)
The rewritten grammar is shown below

13
E → E + T |E − T |T
T → T ∗ F |T /F |F
F → (E)|a|b|c
This grammar is equivalent to the above grammar. Note that T and F are new variables and pro-
ductions are left recursive. Also note that + and − operators are at the first level and and then ∗ and
/ operators are at the second level, which makes precedence of ∗ and / higher than + and −. Since
the productions are left recursive, all opeartors(with same precedence) are left associative. Thus, no
more than two leftmost or two rightmost derivations are possible for w = a+b∗c as we can see below

E ⇒E+T

⇒ T + T (E → T )

⇒ F + T (T → F )

⇒ a + T (F → a)
(5)
⇒ a + T ∗ F (T → T ∗ F )

⇒ a + F ∗ F (T → F )

⇒ a + b ∗ F (F → b)

⇒ a + b ∗ c(F → c)
The above derivation is shown in the following derivation tree. As we can see from the tree, ∗ has
higher precedence than +.

14
Figure 6: Derivation tree for w = a + b ∗ c

Using the unambiguous grammar, the derivation tree for the expression w = a − b + c is shown
below. The tree specifies that + and − are left associative.

15
Figure 7: Derivation tree for w = a − b + c

The following grammar specifies that + and − have higher precedence than ∗ and / and they
are right associative.
E → T ∗ E|T /E|T
T → F + T |F − T |F
F → (E)|a|b|c

Problems

1. Show that the following grammar is ambiguous. Remove the ambiguity (Exercise)
S → AB|aaB
A → a|Aa
B→b

16
Let w = aab. Clearly there are two leftmost derivations for w as shown below
S ⇒ AB ⇒ AaB ⇒ aaB ⇒ aab
S ⇒ aaB ⇒ aab
In the given grammar, the production S → aaB is not necessary. Removing the production
makes the grammar umambiguous as shown below
S → AB
A → a|Aa
B→b

2. Show that the following grammar is ambiguous (Exercise)


S → aSbS|bSaS|λ

17

You might also like