Professional Documents
Culture Documents
Unit-4 Context Free Grammar
Unit-4 Context Free Grammar
4.3. Parse tree and its construction, Ambiguous grammar, Use of parse tree
to show ambiguity in grammar, Inherent Ambiguity
Rules:
Sà Є
Sà 0S1
Here, the two rules define the production P,
Є, 0, 1 are the terminals defining T,
S is a variable symbol defining V
And S is start symbol from where production starts.
CFG Vs RE
• The CFG are more powerful than the regular expressions as
they have more expressive power than the regular
expression.
• Generally regular expressions are useful for describing the
structure of lexical constructs as identical keywords,
constants etc. But they do not have the capability to specify
the recursive structure of the programming constructs.
• However, the CFG are capable to define any of the
recursive structure also. Thus, CFG can define the
languages that are regular as well as those languages that
are not regular.
Compact Notation for the Productions
Sà Є | a | b
The detail of this production look like as
Sà Є
Sàa
Sàb
Meaning of context free:
Consider an example:
SàaMb
Mà A | B
Aà Є | aA
Bà Є | bB
How?
Consider a String aaAb, which is an intermediate stage in the
generating of aaab. It is natural to call the strings “aa” and “b”
that surround the symbol A, the “context” of A in this particular
string. Now, the rule Aà aA says that we can replace A by the
string aA no matter what the surrounding strings are; in other
words, independently of the context of A.
context-sensitive grammar (CSG)
A context-sensitive grammar (CSG) is a formal grammar type
that is more expressive than context-free grammars (CFGs).
In a context-sensitive grammar, the production rules are
applied based not only on the current non-terminal symbol
being replaced but also on the surrounding context or
neighboring symbols. This allows for a more sophisticated
and flexible representation of languages compared to
context-free grammars.
Example:
Consider a simple context-sensitive grammar that generates
strings of the form a^n b^n c^n, where the number of 'a's,
'b's, and 'c's are the same.
Example of context-sensitive grammar (CSG):
Consider a simple context-sensitive grammar that generates
strings of the form a^n b^n c^n, where the number of 'a's, 'b's,
and 'c's are the same.
Solution:
Production rules:
1.XaY -> aXY (replace 'a' in the middle with 'a' on the left)
2.YbZ -> YZb (replace 'b' in the middle with 'b' on the right)
3.ZcW -> ZWc (replace 'c' in the middle with 'c' on the right)
4.aX -> Xa (remove 'a' from the left end)
5.bY -> Yb (remove 'b' from the left end)
6.cZ -> Zc (remove 'c' from the left end)
7.X -> ε (remove 'X', where ε is the empty string)
8.Y -> ε (remove 'Y')
9.Z -> ε (remove 'Z')
10.W -> ε (remove 'W')
Starting symbol: XYZ
There are two possible approaches of derivation:
• Body to head(Bottom Up) approach.
• Head to body(Top Down) approach(Leftmost/ Rightmost derivation).
Body to head
Here, we take strings known to be in the language of each of
the variables of the body, concatenate them, in the proper
order, with any terminals appearing in the body, and infer that
the resulting string is the language of the variables in the head.
Head to Body
Here, we use production from head to body. We expand the
start symbol using a production, whose head is the start
symbol. Here we expand the resulting string until all strings of
terminal are obtained. Here we have two approaches:
Body to head
Consider grammar,
SàS + S
Sà S/S
Sà (S)
SàS-S
SàS*S
Sàa
Sà S + S rule à S + S
Sàa + S rule Sà a [Note here we have
replaced left symbol of the body]
Sàa + S – S rule SàS - S
Sàa + S / S – S rule Sà S / S
Sàa + (S) /S – S rule Sà(S)
Sà(S*S) / S – S; rule SàS*S
Given Grammer
Sà a + (a*S) / S – S rule Sàa SàS + S
Sà a + (a*a) / S – S “ “ Sà S/S
Sà a + (a*a) / a – S “ “ Sà (S)
Sàa + (a*a) / a – a “ “ SàS-S
SàS*S
Sàa
Rightmost derivation for the given string : a+ (a*a)/a-a
Sà S – S; rule SàS - S
Sà S – a; rule Sà a
SàS + S – a; rule Sà S + S
SàS + S / S – a; rule SàS / S
SàS + S / a –a; rule Sà a
SàS + (S) / a – a; rule Sà (S)
SàS + (S*S) / a – a; rule Sà S*S
SàS + (S*a) / a – a; rule Sà a Given Grammer
SàS + S
SàS + (a*a) / a – a;
Sà S/S
Sà a + (a*a) / a – a; Sà (S)
SàS-S
SàS*S
Sàa
Direct Derivation:
α1à α2 : If α2 can be derived directly from α1, then it is direct
derivation.
α1à* α2 : If α2 can be derived from α1 with zero or more
steps of the derivation, then it is just derivation.
Example
SàaSa | ab| a| b| Є
Direct derivation:
S à ab
Sà aSa
Sà aaSaa
Sàaaabaa
Example:
Sà AB |aaB
Aà a | Aa
Bàb
For any string aab;
We have two leftmost derivations as;
SàAB the parse tree for this derivation is:
àAaB
àaaB
à aab
Also,
Sà aaB the parse tree for this derivation is:
àaab
Example: Check whether the grammar G with production rules −
X → X+X | X*X |X| a is ambiguous or not.
Solution
Let’s find out the derivation tree for the string "a+a*a". It has two leftmost
derivations.
Derivation 1 − Derivation 2 −
X → X+X → a +X X → X*X → X+X*X → a+ X*X
→ a+ X*X → a+a*X → a+a*X → a+a*a
→ a+a*a Parse tree 2 −
Parse tree 1 −
Since there are two parse trees for a single string "a+a*a", the
grammar G is ambiguous.
CFG Simplification
In a CFG, it may happen that all the production rules
and symbols are not needed for the derivation of
strings. Besides, there may be some null productions
and unit productions. Elimination of these productions
and symbols is called simplification of CFGs.
Simplification essentially comprises of the following
steps −
• A. Reduction of CFG /(Eliminate useless symbol):
• B. Removal of Unit Productions
• C. Removal of Null Productions
Goal of CSG Simplification::
The goal of CFG Simplification is to show that every context free
language (without Є) is generated by a CFG in which all production are
of the form
Aàa
A à BC
S→ε
where A,B and C are variables(V), and a is terminal(T).
This form is called Chomsky Normal Form.
Phase 2 −
Y1 = { S }
Y2 = { S, A, C } from rule S → AC
Y3 = { S, A, C, a, c } from rules A → a and C → c
Y4 = { S, A, C, a, c }
Since Y3 = Y4, we can derive G” as −
G” = { { A, C, S }, { a, c }, P, {S}}
where P: S → AC, A → a, C → c
Simplification essentially comprises of the following steps ::
A.Reduction of CFG
(4) Now we will find out more than two variables in the R.H.S
(4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals in
R.H.S.
Hence we will apply step 4 and step 5 to get the following final
production set which is in CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
2) Sà AACD
Aà aAb | Є
Cà aC | a
DàaDa | bDa | Є
3)
Sà aaaaS | aaaa
Bakus-Naur Form
This is another notation used to specify the CFG. It is named
so after John Bakus, who invented it, and Peter Naur, who
refined it. The Bakus-Naur form is used to specify the
syntactic rule of many computer languages, including Java.
Here, concept is similar to CFG, only the difference is instead
of using symbol “à’’ in production, we use symbol ::=. We
enclose all non-terminals in angle brackets, < >.
For example:
Then
we can remove left recursion as:
Aà β1A’ | β2A’| …. | βnA’
A’à α1A’ | α2A’ | ... | αmA’ | Є
Concatenation
If L1 and L2 are context free languages, then L1L2 is also context free.
Example
Union of the languages L1 and L2, L = L1L2 = { anbncmdm }
The corresponding grammar G will have the additional production S →
S1 S2
Kleene Star
If L is a context free language, then L* is also context free.
Example
Let L = { anbn , n ≥ 0}. Corresponding grammar G will have P: S → aAb|
ε
Kleene Star L1 = { anbn }*
The corresponding grammar G1 will have additional productions S1 →
SS1 | ε
Context-free languages are not closed under −
Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is
not necessarily context free.
Intersection with Regular Language − If L1 is a regular language and
L2 is a context free language, then L1 ∩ L2 is a context free language.
Complement − If L1 is a context free language, then L1’ may not be
context free.
Regular Grammar:
A regular grammar represents a language that is accepted by some
finite automata called regular language. A regular grammar is a CFG
which may be either left or right linear. A grammar in which all of the
productions are of the form Aà wB (wB is of the form α) or Aà w for
A, B ε V and w ε T* is called left linear.
Equivalence of Regular Grammar and Finite Automata
1. Let G = ( V, T, P and S) be a right linear grammar of a language
L(G). we can construct a finite automata M accepting L(G) as;
M = ( Q, T, δ, [S]. {[Є]})
Where,
Q – consists of symbols [α] such that α is either S or a suffix from right
hand side of a production in P.
T - is the set of input symbols which are terminal symbols of G.
[S] – is a start symbol of G which is start state of finite automata.
{[Є]} – is the set of final states in finite automata.
And δ is defined as:
If A is a variable then, δ([A], Є) = [α] such that Aà α is a production.
If ‘a’ is a terminal and α is in T* U T*V then δ([aα], a) = [α].
For example;
Sà 00B | 11C | Є
Bà 11C | 11
Cà00B | 00
Now the finite automata can be configured as;
Exercise
Write the equivalent finite automata
The “pumping lemma for context free languages” says that in any
sufficiently long string in a CFL, it is possible to find at most two short,
nearby substrings that we can “pump” in tandem (one behind another).
i.e. we may repeat both of the strings I times, for any integer I, and the
resulting string will still be in the language. We can use this lemma as a
tool for showing that certain languages are not context free.
Pumping lemma for context free Language:
The “pumping lemma for context free languages” says that
in any sufficiently long string in a CFL, it is possible to find
at most two short, nearby substrings that we can “pump” in
tandem (one behind another). i.e. we may repeat both of
the strings I times, for any integer I, and the resulting string
will still be in the language. We can use this lemma as a
tool for showing that certain languages are not context free.
Basis Step: Let n = 1, then the tree consists of only the root and a leaf labeled with a terminal.
So, string w is a simple terminal.
Thus |w| = 1 =21-1 = 20 which is true.
Inductive Step: Suppose n is the length of longest path and n>1. Then the root of the tree uses a production
of the form; AàBC, since n>1
No path is the sub-tree rooted at B and C can have length greater than n-1 since B&C are child of A. Thus,
by inductive hypothesis, the yield of these sub-trees are of length almost 2n-2.
The yield of the entire tree is the concatenation of these two yields and therefore has length at most,
2n-2 + 2n-2 = 2n-1
|w|<=2n-1. Hence proved.
Pumping Lemma
It states that every CFL has a special value called pumping length such that all
longer strings in the language can be pumped. This time the meaning of pumped
is a bit more complex. It means the string can be divided into five parts so that
the second and the fourth parts may be repeated together any number of times
and the resulting string still remains in the language.
Theorem: Let L be a CFL. Then there exists a constant n such that if z is any
string in L such that |z| is at least n then we can write z=uvwxy, satisfying
following conditions.
I) | vx| >0
II) |vwx| ≤ n
III) For all i ≥ 0, uviwxiy ε L. i.e. the two strings v & x can be “pumped” nay
number of times, including ‘0’ and the resulting string will be still a member of
L.
Example Show that L = {anbncn:n≥0} is not a CFL
To prove L is not CFL, we can make use of pumping lemma for CFL. Let L be
CFL. Let any string in the language is ambmcm
z = uvwxy, |vwx| ≤ m, |vx| > 0
Case I:
vwx is in am
q x, y → z q
1 2
L(G) = {0n 1n : n ≥ 0}
Example: Context-free grammar G, Σ = {0,1}
S →A
S→B
A → 0 A1
A→ε
B→1B0
B→ε
L(G) = L(A) U L(B)
= {0n 1n : n ≥ 0} U {1n 0n : n ≥ 0}
S → S+S | S x S | ( S ) | 0 | 1
Next: Ambiguity
● Ambiguity: Some string may have multiple
derivations in a CFG
One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0
Another derivation:
S→?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1
One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0
Another derivation:
S → SxS → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1
One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0
Another derivation:
S → SxS → Sx0 → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1
One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0
Another derivation:
S → SxS → Sx0 → S+Sx0 → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1
One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0
Another derivation:
S → SxS → Sx0 → S+Sx0 → S+0x0 → ?
Example: The string 1+0x0 has two derivations in
S → S+S | S x S | ( S ) | 0 | 1
One derivation:
S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0
Another derivation:
S → SxS → Sx0 → S+Sx0 → S+0x0 → 1+0x0
We now want to define CFG with no ambiguity