Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Chapter 7:

Context-Free Languages
& context-free Grammars

1
Context-Free Languages (CFL)

 The language L={anbn: n>=0}


describes a simple kind of nested structure found in programming language,
but is not a regular language, L is a Context-free language.

 In order to cover this and other more complicated features we must enlarge
the family of languages, i.e. context-free languages. A context-free
language is generated by a context-free grammar.

 Grammar is more powerful than finite automata or regular expressions, but


still cannot define all possible languages (Turing machine is the most
powerful machine).

 Context-free grammar is useful for nested structures, e.g., parentheses in


programming languages.

 Context-free grammar is important for the design of efficient compilers


dedicated to programming language (C, C++, JAVA, C# …)
2
Context-Free Grammars (CFG)
The productions in a regular grammar are restricted in two ways: the
left side must be a single variable, while the right side has a special
form.
To create grammars that are more powerful, we must relax some of
these restrictions; by retaining the restriction on the left side, but
permitting anything on the right side, i.e. context-free grammars.

Definition:
a grammar G=(V, T, S, P) is said to be context-free
if all productions in P have the form:
A→x where A V and x  (V  T  )*

3
Context-Free Grammars (CFG)
a language L is said to be context-free if and only if there is
a context-free grammar G such that L=L(G).

every regular grammar is context-free, so a regular


language is also a context-free language.

a context-free grammars derive their name from the fact


that the substitution of the variable on the left of a
production can be made any time such a variable
appears in a sentential form. It does not depend on the
symbols in the rest of the sentential form (the context).

4
Linear context-free Grammars
The grammar G=({S}, {a,b}, S, P) with productions:
S → aSa
S → bSb
S→

is context-free. A typical derivation in this grammar is ;


S => aSa => aaSaa => aabSbaa => aabbaa
Then L(G)={wwR: w  {a,b}* }
The language is context-free, but is not regular.

The grammar is a context-free linear grammar, a context-free grammar is


not necessarily linear.

5
Nonlinear context-free Grammars
The grammar G=({S}, {a,b}, S, P) with productions:
S → aSb Ι SS Ι 

This is a nonlinear context-free grammar (two possible substitution).


some strings in L(G) are abaabb, aababb, and ababab.

L(G)={w  {a,b}*: na(w)=nb(w) and na(v) ≥ nb(v) , where v is any prefix of w}

If we replace a and b with left and right parentheses, respectively.


The language L includes strings as (()) and () () () and is in fact the set of all
properly nested parenthesis structures for any programming language.

There are many equivalent grammars: It is not easy to see if there are any
linear grammar equivalent to the grammar above.

6
Equivalent context-free Grammars
The language L={anbm : n ≠ m} is context free.

G=({S}, {a,b}, S, P) with productions:


S → AS1 Ι S1B
S1 → aS1b Ι 
A → aA Ι a
B → bB Ι b

Let G’ be another grammar G’=({S}, {a,b}, S, P) with productions:


S → aSb Ι A Ι B
A → aA Ι a
B → bB Ι b
L(G)=L(G’)=L and then G and G’ are two equivalent grammars. G is not
linear where G’ is linear. 7
Leftmost and Rightmost derivation
In context-free grammars that are not linear, a derivation may involve
sentential forms with more than one non-terminal variable. In such cases,
we have a choice in the order in which variables are replaced.
The grammar G=({S, A, B}, {a,b}, S, P) with productions:
1: S → AB
2: A → aaA
3: A → 
4: B → Bb
5: B → 
This grammar generates the language L(G)={a 2nbm : n ≥ 0, m ≥ 0}
Consider now the two derivations:
S =>1 AB =>2 aaAB =>3 aaB =>4 aaBb =>5 aab
S =>1 AB =>4 ABb =>2 aaABb =>5 aaAb =>3 aab

From this we see that two derivations yield the same sentence and use
the same productions. The difference is in the order in which the
productions are applied. To remove such irrelevant factors, we often
require that the variables be replaced in a specific order. 8
Leftmost and Rightmost derivation
• A derivation is said to be Leftmost if in each step the leftmost variable in
the sentential form is replaced

• A derivation is said to be Rightmost if in each step the rightmost variable


is replaced in the sentential form

Example: consider a grammar with productions


S → aAB
A → bBb
B →A Ι 
Then S => aAB => abBbB => abAbB => abbBbbB => abb  bbB => abbbb
Is a leftmost derivation of the string abbbb.

A rightmost derivation of the same string is:

S => aAB => aA  => abBb => abAb => abbBbb => abbbb

9
Parsing and Ambiguity
•We have concentrated on the generative aspects of grammars:
given a grammar G, we studied the set of strings that can be
derived using G

•We are also concerned with the analytical side of the grammar:
given a string w of terminals, we want to know whether or not w is in
L(G)

•If w  L(G) we may want to find a derivation of w

•The term Parsing describes finding a sequence of productions by


wich w  L(G) is derived

10
Parse Tree
A second way of showing derivations, independent of the order in which
productions are used, is by using a Parse Tree. A parse tree is an
ordered tree in which nodes are labeled with the left sides of productions
and in which the children of a node represent its corresponding right
sides. Beginning with the root, labeled with the start symbol and ending in
leaves that are terminals. A partial parse tree, every leaf has a label
from (V  T).
S
S → SS Ι aSb Ι ab

S S

a S b a b

a b then S =>* aabbab 11


Theorem
Let G=(V, T, S, P) be a context-free grammar. Then for every wL(G),
there exists a parse tree of G whose yield is w. Conversely, the yield of
any parse tree is a string in L(G).

Poof:
first we show that for every sentential form of L(G) there is a corresponding
partial parse tree. by induction on the number of steps in the derivation.
S => u implies that there is a production S → u . Assume that for every
sentential form derivable in n steps, there is a corresponding partial parse tree.
Now any w derivable in n+1 steps must be such that:

s=>* xAy, x, y  (V  T)* and A  V, derivable in n steps


xAy => x a1 a2…am y=w with a1 , a2, …am  (V  T)
Similarly we can show that every partial parse tree represents a sentential form

Note: parse trees show which productions are used in obtaining a sentence, but
do not give the order of their application.

12
Ambiguous Grammars
• A CFG is ambiguous if there is a string in
the language that is the yield of two or
more parse trees.

• Example: S -> SS | aSb | ab

(This grammar is for balanced-parentheses


language)
• Two parse trees for ababab on next slide. 13
Example – Continued

S S

S S S S

S S a b a b S S

a b a b a b a b

14
Ambiguity, Left- and
Rightmost Derivations
• If there are two different parse trees,
they must produce two different leftmost
derivations by the construction given in
the proof.
• Conversely, two different leftmost
derivations produce different parse
trees by the other part of the proof.
• Likewise for rightmost derivations.
15
Ambiguity – Continued
• Thus, equivalent definitions of
“ambiguous grammar’’ are:
1. There is a string in the language that has
two different leftmost derivations.
2. There is a string in the language that has
two different rightmost derivations.

16
Ambiguity is a Property of
Grammars, not Languages

• For the balanced-parentheses language,


here is another CFG, which is
unambiguous.
B -> (RB | ε B, the start symbol,
derives balanced strings.
R -> ) | (RR
R generates strings that
have one more right
parenthesis
than left. 17
Example: Unambiguous Grammar
B -> (RB | ε R -> ) | (RR
• Construct a unique leftmost derivation for
a given balanced string of parentheses by
scanning the string from left to right.
– If we need to expand B, then use B -> (RB if
the next symbol is “(” and ε if at the end.
– If we need to expand R, use R -> ) if the next
symbol is “)” and (RR if it is “(”.
18
The Parsing Process
Remaining Input: Steps of leftmost
(())() derivation:
B

Next
symbol

B -> (RB | ε R -> ) | (RR 19


The Parsing Process
Remaining Input: Steps of leftmost
())() derivation:
B
(RB
Next
symbol

B -> (RB | ε R -> ) | (RR 20


The Parsing Process
Remaining Input: Steps of leftmost
))() derivation:
B
(RB
Next ((RRB
symbol

B -> (RB | ε R -> ) | (RR 21


The Parsing Process
Remaining Input: Steps of leftmost
)() derivation:
B
(RB
Next ((RRB
symbol
(()RB

B -> (RB | ε R -> ) | (RR 22


The Parsing Process
Remaining Input: Steps of leftmost
() derivation:
B
(RB
Next ((RRB
symbol
(()RB
(())B

B -> (RB | ε R -> ) | (RR 23


The Parsing Process
Remaining Input: Steps of leftmost
) derivation:
B (())(RB
(RB
Next ((RRB
symbol
(()RB
(())B

B -> (RB | ε R -> ) | (RR 24


The Parsing Process
Remaining Input: Steps of leftmost
derivation:
B (())(RB
(RB (())()B
Next ((RRB
symbol
(()RB
(())B

B -> (RB | ε R -> ) | (RR 25


The Parsing Process
Remaining Input: Steps of leftmost
derivation:
B (())(RB
(RB (())()B
Next ((RRB (())()
symbol
(()RB
(())B

B -> (RB | ε R -> ) | (RR 26


LL(1) Grammars

• As an aside, a grammar such B -> (RB | ε


R -> ) | (RR, where you can always figure
out the production to use in a leftmost
derivation by scanning the given string
left-to-right and looking only at the next
one symbol is called LL(1).
– “Leftmost derivation, left-to-right scan, one
symbol of lookahead.”
27
LL(1) Grammars – (2)

• Most programming languages have


LL(1) grammars.
• LL(1) grammars are never ambiguous.

28
Inherent Ambiguity
• It would be nice if for every ambiguous
grammar, there were some way to “fix” the
ambiguity, as we did for the balanced-
parentheses grammar.
• Unfortunately, some CFL’s are inherently
ambiguous, meaning that every grammar
for the language is ambiguous.

29
Example: Inherent Ambiguity
• The language {0i1j2k ; i = j or j = k} is
inherently ambiguous.
• Intuitively, at least some of the strings of
the form 0n1n2n must be generated by two
different parse trees, one based on
checking the 0’s and 1’s, the other based
on checking the 1’s and 2’s.

30
One Possible Ambiguous
Grammar
S -> AB | CD
A -> 0A1 | 01
A generates equal 0’s and 1’s
B -> 2B | 2
B generates any number of 2’s
C -> 0C | 0
C generates any number of 0’s
D -> 1D2 | 12
D generates equal 1’s and 2’s

And there are two derivations of every string


with equal numbers of 0’s, 1’s, and 2’s. E.g.:
S => AB => 01B =>012
S => CD => 0D => 012 31
Removing Ambiguity

32

You might also like