Chapter Three

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

5.

Normal Forms
a) Chomsky Normal Form (CNF)

A CFG is in Chomsky Normal Form if the Productions are in the


following forms: A → BC | a, Where A, B, and C are non-
terminals and a is terminal.

Example: The grammar: SAS|a, AAS|b is in CNF. The


grammar SAS|AAS, ASA|aa is not in CNF.

Which production violates the CNF rules?

A context free grammar (CGF) is in Chomsky Normal Form


(CNF) if all production rules satisfy one of the following
conditions:
Cont…
I. A non-terminal generating a terminal (e.g.; Xx)

II. A non-terminal generating two non-terminals (e.g.; XYZ)

III. Start symbol generating ε. (e.g.; S ε)

Example: Consider the following grammars,


G1 = (V,T, S, P),
P: S  a, S  AZ, A  a, Z  z &
G2 = (V,T, S, P),
P: S  a, S  aZ, Z  a,
The grammar G1 is in CNF as production rules satisfy the rules
specified for CNF. However, the grammar G2 is not in CNF as
the production rule S  aZ contains terminal followed by non-
terminal which does not satisfy the rules specified for CNF.
Cont…
Example: Convert the grammar with productions
– S → ABa,
– A → aab,
– B → Ac, To Chomsky normal form.
Step 1:
– S → ABBa,
– A → BaBaBb,
– B → ABc
– Ba → a,
– Bb → b,
– Bc → c.
Step 2: introduce additional variables to get the first two into normal form.
– S → AD1
– D1 → BBa,
– A → BaD2
– D2 → BaBb,
– B → ABc
– Ba → a,
– Bb → b,
– Bc → c.
Procedure1
Algorithm to Convert into Chomsky Normal Form:
Step 1 − If the start symbol S occurs on some right side, create a new start
symbol S’ and a new production S’→ S.
Step 2 − Remove Null productions. (Using the Null production removal
algorithm discussed earlier)
Step 3: Remove unit productions. (Using the Unit production removal algorithm
discussed earlier)
Step 4: Replace each production A → B1…Bn where n > 2 with A → B1C
where C → B2 …Bn. Repeat this step for all productions having two or more
symbols in the right side.

Step 5: If the right side of any production is in the form A → aB where a is a


terminal and A, B are non-terminal, then the production is replaced by A →
XB and X → a.
Repeat this step for every production which is in the form A → aB.
Procedures2
Note:

For a given grammar, there can be more than one CNF.

CNF produces the same language as generated by CFG.

CNF is used as a preprocessing step for many algorithms for CFG like CYK,
bottom-up parsers etc.
Cont…
Example – Let us take an example to convert CFG to CNF.

Consider the given grammar G1: S → ASB, A →aAS|a|ε, B → SbS|A|bb?

Step1. As start symbol S appears on the RHS, we will create a new production
rule S0->S. Therefore, the grammar will become:

S0 S, S→ASB, A→ aAS|a|ε, B → SbS|A|bb

Step2. As grammar contains null production A  ε, its removal from the


grammar yields:

S0 S, S→ASB|SB, A → aAS|aS|a, B→SbS|A|ε|bb Now, it creates null


production B→ ε, its removal from the grammar yields:

S0  S, S → AS|ASB| SB| S, A → aAS|aS|a, B → SbS| A|bb


Now, it creates unit production B ->A, its removal from the grammar yields:
S0->S, S → AS|ASB| SB| S, A → aAS|aS|a, B → SbS|bb|aAS|aS|a

Also, removal of unit production S0->S from grammar yields: S0->AS|ASB|


SB| S, S→ AS|ASB| SB| S, A → aAS|aS|a, B → SbS|bb|aAS|aS|a. Also,
removal of unit production S->S and S0->S from grammar yields: S0->
AS|ASB| SB, S → AS|ASB| SB, A → aAS|aS|a, B → SbS|bb|aAS|aS|a

Step 3. In production rule A->aAS |aS and B-> SbS|aAS|aS, terminals a and b
exist on RHS with non-terminates. Removing them from RHS:

S0-> AS|ASB| SB S → AS|ASB| SB A → XAS|XS|a B → SYS|bb|XAS|XS|a


X →a Y→b Also, B->bb can’t be part of CNF, removing it from grammar
yields:
Cont…
S0-> AS|ASB| SB S → AS|ASB| SB A → XAS|XS|a B → SYS|VV|XAS|XS|a X → a Y
→bV→b
Step 4: In production rule S0->ASB, RHS has more than two symbols, removing it
from grammar yields:
S0-> AS|PB| SB S → AS|ASB| SB A → XAS|XS|a B → SYS|VV|XAS|XS|a X → a Y
→ b V → b P → AS
Similarly, S->ASB has more than two symbols, removing it from grammar yields:
S0-> AS|PB| SB S → AS|QB| SB A → XAS|XS|a B → SYS|VV|XAS|XS|a X → a Y →
b V → b P → AS Q → AS Similarly, A->XAS has more than two symbols, removing it
from grammar yields:
S0-> AS|PB| SB S → AS|QB| SB A → RS|XS|a B → SYS|VV|XAS|XS|a X → a Y → b
V → b P → AS Q → AS R → XA Similarly, B->SYS has more than two symbols,
removing it from grammar yields:
S0 -> AS|PB| SB S → AS|QB| SB A → RS|XS|a B → TS|VV|XAS|XS|a X → a Y → b
V → b P → AS Q → AS R → XA T → SY Similarly, B->XAX has more than two
symbols, removing it from grammar yields:
S0-> AS|PB| SB S → AS|QB| SB A → RS|XS|a B → TS|VV|US|XS|a X → a Y → b V
→ b P → AS Q → AS R → XA T → SY U → XA So this is the required CNF for
given grammar.
Conti…
Problem: Convert the following CFG into CNF:

S → ASA | aB, A → B | S| ε, B → b | ε
Solution:
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is
added to the production set and it becomes:
S0→S, S→ ASA | aB, A → B | S, B → b | ∈
(2) Now we will remove the null productions: B → ∈ and A → ∈

After removing B → ε, the production set becomes: S0→S, S→

ASA | aB | a, A → B | S | ∈, B → b After removing A → ∈, the

production set becomes: S0→S, S→ ASA | aB | a | AS | SA | S, A

→ B | S, B → b
Conti…
(3) Now we will remove the unit productions. After removing S
→ S, the production set becomes: S0→S, S→ ASA | aB | a | AS |
SA, A → B | S, B → b

After removing S0→ S, the production set becomes: S0→ ASA |


aB | a | AS | SA, S→ ASA | aB | a | AS | SA

A → B | S, B → b

After removing A→ B, the production set becomes: S0 → ASA |


aB | a | AS | SA, S→ ASA | aB | a | AS | SA

A→S|b

B→b
Conti…
After removing A→ S, the production set becomes: S0 → ASA |
aB | a | AS | SA, S→ ASA | aB | a | AS | SA

A → b |ASA | aB | a | AS | SA, B → b

(4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-
terminals in R.H.S. Hence we will apply step 4 and step 5 to get
the following final production set which is in CNF:
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
Conti…
(5) We have to change the productions S0→ aB, S→ aB, A→ aB

And the final production set becomes −

S0→ AX | YB | a | AS | SA

S→ AX | YB | a | AS | SA

A → b A → b |AX | YB | a | AS | SA

B→b

X → SA

Y→a

•A context free grammar (CGF) is in Greibach Normal Form (GNF) if


all production rules satisfy one of the following conditions:
b) Greibach Normal Form
I. A non-terminal generating a terminal (e.g. Xx)

II. A non-terminal generating a terminal followed by any number of non-terminals


(e.g. X  aX1X2…Xn)

III. Start symbol generating ε. (e.g. S ε)

Generally it can be defined as [AaX, a ϵ T, X ϵ V*]  same as S - Grammar.

Example: Consider the following grammars:

G1 = {S  aA|bB, B  bB|b, A  aA|a}

G2 = {S  aA|bB, B  bB|ε, AaA|ε}

The grammar G1 is in GNF as production rules satisfy the rules specified for GNF.
However, the grammar G2 is not in GNF as the production rules B ε and A ε do
not satisfy the rules specified for GNF (only start symbol can generate ε).

Note: For a given grammar, there can be more than one GNF. GNF produces the same
language as generated by CFG.
Algorithm to Convert a CFG into Greibach Normal Form

Step 1: If the start symbol S occurs on some right side, create a


new start symbol S’ and a new production S’ → S.

Step 2: Remove Null productions. (Using the Null production


removal algorithm discussed earlier)

Step 3: Remove unit productions. (Using the Unit production


removal algorithm discussed earlier)
Step 4: Remove all direct and indirect left-recursion.
Step 5: Do proper substitutions of productions to convert it into
the proper form of GNF.
Step 1. Convert the grammar into CNF.

If the given grammar is not in CNF, convert it to CNF.

Step 2. Eliminate left recursion from grammar if it exists.


If CFG contains left recursion, eliminate them.

Step 3. Convert the production rules into GNF form.


Example:
S → XB | AA

A → a | SA

B→b

X→a

Solution: As the given grammar G is already in CNF and there is no left


recursion, so we can skip step 1 and step 2 and directly go to step 3.
•The production rule A → SA is not in GNF, so we substitute S → XB | AA in
the production rule A → SA as:
1. S → XB | AA
2. A → a | XBA | AAA
3. B → b
4. X → a
•The production rule S → XB and B → XBA is not in GNF, so we substitute
X → a in the production rule S → XB and B → XBA as:
1. S → aB | AA
2. A → a | aBA | AAA
3. B → b
4. X → a
• Now we will remove left recursion (A → AAA), we get:
1. S → aB | AA
2. A → aC | aBAC
3. C → AAC | ε
4. B → b
5. X → a
Now we will remove null production C → ε, we get:
S → aB | AA
A → aC | aBAC | a | aBA
C → AAC | AA
B→b
X→a
The production rule S → AA is not in GNF, so we substitute A → aC | aBAC |
a | aBA in production rule S → AA as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → AAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC
| a | aBA in production rule C → AAC as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → aCAC | aBACAC | aAC | aBAAC
C → aCA | aBACA | aA | aBAA
B→b
X → a, Hence, this is the GNF form for the grammar G.
Removing Left Recursion
A grammar is left recursion if it has a non terminal (variable) S such
there is a derivation S  Sa | B, where a ε (V+T)* and B (V+T)*
[sequence of terminals and non terminal that do not start with S]. Due
to the presence of left recursion some top down parsers enter into
infinite loop so we have to eliminate left recursion.
Let the productions is of the form AAα1|Aα2|Aα3|….|Aαm|β1| β2|
β3|….| βn, where no βi begins with an A. Then we replace the A-
productions by: A β1A’| β2A’|….| βnA’.
A’ αA’| α2A’| α3A’|…| αmA’|ε. The non terminal A generates the
same strings as before but is no longer left recursive.
Let us look at some example to understand better.
[We will see more details in Compiler Design course.]
Removing Left Recursion
Example:
S → XB | AA
A → a | SA
B→b
X→a
Solution: As the given grammar G is already in CNF and there is
no left recursion, so we can skip step 1 and step 2 and directly go
to step 3.
The production rule A → SA is not in GNF, so we substitute
S → XB | AA in the production rule A → SA as:
S → XB | AA
A → a | XBA | AAA
B→b
X→a
Cont…
Example:

The grammar,

SAB,

A aA/bB/b,

Bb is not in GNF. However, using the proper substitution,

S aAB/bBB/Bb,

A aA/bB/b,

B b
Example 2: S → XA|BB B → b|SB X → b A → aAS G1 is already in CNF
and there is not left recursion, we can skip step 1 and 2 and directly move to
step 3.

The production rule B->SB is not in GNF, therefore, we substitute S -> XA|BB
in production rule B->SB as:

S → XA|BB, B → b|XAB|BBB, X → b, A → a, The production rules S->XA


and B->XAB is not in GNF, therefore, we substitute X->b in production rules
S->XA and B->XAB as: S → bA|BB B → b|bAB|BBB X → b A → a,
Removing left recursion (B->BBB), we get, S → bA|BB B → bC|bABC, C →
BBC| ε, X → b, A → a, Removing null production (C-> ε), we get, S →
bA|BB B → bC|bABC|b|bAB C → BBC|BB X → b A → a, The production
rules S->BB is not in GNF, therefore, we substitute B→bC|bABC|b|bAB in
production rules S->BB as: S → bA| bCB|bABCB|bB|bABB, B →
bC|bABC|b|bAB, C → BBC|BB, X → b A → a
The production rules C->BB is not in GNF, therefore, we
substitute B → bC|bABC|b|bAB in production rules C->BB as:

S → bA| bCB|bABCB|bB|bABB B → bC|bABC|b|bAB C →


BBC C → bCB|bABCB|bB|bABB X → b A → a The production
rules C->BBC is not in GNF, therefore, we substitute B →
bC|bABC|b|bAB in production rules C->BBC as:

S → bA| bCB|bABCB|bB|bABB B → bC|bABC|b|bAB C →


bCBC|bABCBC|bBC|bABBC C → bCB|bABCB|bB|bABB X →
b A → a This is the GNF form for the grammar G1.
Cont…
Problem2: Convert the following CFG into CNF?

S → XY | Xo | p

X → mX | m

Y → Xn | o

Solution: Here, S does not appear on the right side of any


production and there are no unit or null productions in the
production rule set. So, we can skip Step 1 to Step 3.

Step 4: Now after replacing X in S → XY | Xo | p With mX | m


we obtain S → mXY | mY | mXo | mo | p.
Cont…
and after replacing X in Y → Xn | o with the right side of X → mX
| m we obtain Y → mXn | mn | o, then we came to the final GNF
as the following:

S → mXY | mY | mXC | mC | p

X → mX | m

Y → mXn | mn | o
Pumping Lemma
Pumping lemma for CFG is used to prove that a language is NOT
Context Free.

If A is a Context Free Language, then A has a pumping Length


‘P’ such that any string ‘S’, where |S| ≥ P may be divided into
five(5) section S = uvxyz ε ∑*, such that the following condition
must be true.

i. uvⁱxyⁱz is in A for every i ≥ 0.

ii. |xy| > 0.

iii. |vxy| ≤ P
Pumping Lemma
There are two types of pumping Lemmas, which are defined for:

I. Regular Languages, and

II.Context Free Languages

Pumping Lemma for Regular Language: For any regular


language L, there exists an integer n, such that for all S ε L with
|S| >= n, there exists x, y, z ε ∑*, such that S = xyz, and,

i. |xy| <= n

ii.|y| >=1

iii.For all i >= 0, xyⁱz ε L.


Pumping Lemma
Let if L is a context-free language, there is a pumping
length p such that any string W ∈ L of length ≥ p can be written
as W = uvwxy, where vy ≠ ε, |vxy| ≤ p, and for all i ≥ 0, uvixyiz
∈ L.

It states that for any context free language L, it is possible to find


two substrings that can be ‘pumped ‘ any number of times and
still be in the language L, we break its string into five parts and
pump second and fourth substring. Pumping lemma here is also,
used as a tool to prove that a language is not CFL.
Problem: Find out whether the language L = {xnynzn | n ≥ 1} is
not context free .

Solution: Let L is context free. Then, L must satisfy pumping


lemma.

At first, choose a number n of the pumping lemma. Then, take S


as 0n1n2n. Break S into uvwxy, where |vwx| ≤ n and vx ≠ ε.
Hence vwx can not involve both 0’s and 2’s, since the last 0 and
the first 2 are at least (n+1) positions apart.
There are two cases now:

Case 1 − vwx has no 2s. Then vx has only 0’s and 1’s. Then uwy,
which would have to be in L, has n 2’s, but fewer than n 0’s or
1’s.

Case 2 − vwx has no 0s.

Here contradiction occurs. Hence, L is not a context-free


language.
Check if the language is Context Free or Not
We typically face questions to identify which of the given
languages are context free.

In case of regular languages, it is comparatively easy to answer


this, but for Context Free languages, it is tricky sometimes.

Pumping Lemma provides us with ability to perform negative


test, i.e. if a language doesn’t satisfy pumping lemma, then we
can definitely say that it is not context free, but if it satisfies, then
the language may or may not be context free.
Pumping Lemma is more of a mathematical proof, takes more
time and to apply it on context free languages is a tedious task
and finding out counter example for complex language
expressions is not much handful.

We can address this problem very quickly, based on common


observations and analysis:

1.Every regular language is context free.

Example – {a mb lc kd n| m, l, k, n >= 1 } is context free, as it is


regular too.
Cont…
2. Given an expression such that it is possible to obtain a center
or mid point in the strings, so we can carry out comparison of left
and right sub-parts using stack.

Example 1 – L = { a nb n | n >= 1} is context free, as we can push


a’s and then we can pop a’s for each occurrence of b.

Example 2 – L = {a mb nc (m+n)} is context free. We can rewrite it


as {ambncncm}.

Example 3 – L = {anb(2n)} is context free, as we can push two a’s


and pop an a for each occurrence of b. Hence, we get a mid-point
here as well.
Cont…
Example 4 – L = {anbncn} is not context free.

3. Given expression is a combination of multiple expressions


with mid-points in them, such that each sub-expression is
independent of other sub-expressions, then it is context free.

Example 1: L={ambmcndn} is context free. It contains multiple


expressions with a mid-point in each of them.

Example 2 – L = {ambncmdn } is not context free.

4. Given expression consists of an operation where mid-point


could be found along with some independent regular expressions
in between, results into context free language.
Cont…
Example – L = {a mb ic id k} is a context free language. Here, we
have b^i and d^k as independent regular expressions in between,
which doesn’t affect the stack.

5. An expression that doesn’t form a pattern on which linear


comparison could be carried out using stack is not context free
language.

Example 1: L = { a^m b^n^2 } is not context free.

Example 2: L = { a^n b^2^n } is not context free.


Example 3: L = { a^n^2 } is not context free.
Example 4: L = {a m | m is prime } is not context free.
Cont…
6. An expression that involves counting and comparison of three
or more variables independently is not context free language, as
stack allows comparison of only two variables at a time.
Example 1 – L = {a n b n c n } is not context free.

Example 2 – L = { w | na(w) = nb(w) = nc(w) } is not context free.

Example 3 – L = {a i b j c k | i > j > k } is not context free.

7. A point to remember is counting and comparison could only


be done with the top of stack and not with bottom of stack in
Push Down Automata, hence a language exhibiting a
characteristic that involves comparison with bottom of stack is
not a context free language.
Cont…
Example 1 – L = {a m b n c m d n} is not context free.
Pushing a’s first then b’s. Now, we will not be able to compare c’s with a’s as
the top of the stack has b’s.
Example 2 – L = { WW | W belongs to {a, b}* } is not context free.
One might think to draw a non-deterministic push down automaton, but it
will not help as first symbol will be at bottom of the stack and when the
second W starts, we will not be able to compare it with the bottom of stack.
8. If we can find mid-point in the expression even in a non-deterministic way,
then it is context free language.

Example 1 – L = { WW r | W belongs to {a, b}* } is not context free language.

Example 2 – L = {a i b j c k | i=k or j=i } is a context free language.


Worksheet
1. Convert the following grammars into CNF?
a) SABa,Aaab, BAc
b) SaSb|ab
c) SaSaA|A, AabA|b
d) SabAB, AbAB|λ, BBaa|A|λ
e) SAB|aB, Aaab|λ, BbbA
2. Convert the following grammars into GNF?
a) SaSb|bSa|a|b
b) SaSb|ab
c) Sab|aS|aaS
d) SaBb|a, AaaA|B, BbAb
Question 1d & 2d are Individual Assignment 5%.

You might also like