H5 Context Free Grammar

CMSC 114– Context-free Grammar and Languages Basics
The first three rules (1-3) forms the BASIS. They tell us
Many languages cannot be REGULAR. Thus we need to that the class of palindromes includes the strings , 0
consider larger classes of languages called “Context- and 1. None of the right sides of these rules (the
Free Languages (CFL's)” portions following the arrows) contains a variable,
which is why they form a basis for the definition.
CONTEXT-FREE GRAMMAR (CFG) – another notation
for describing languages; formal notation for expressing The last two rules (4-5) form the INDUCTIVE part of the
recursive definition definition. For instance, rule 4 says that if we take any
string from the class P, then 00 is also in class P.
The natural, recursive notations of CFL have:
Rule 5 likewise tells us that 11 is also in P.
 played a central role in compiler technology since

the 1960's.
 enhanced the implementation of parsers (functions

that discover the structure of a program)
 been used to describe document formats especially

for information exchange in the WWW
Note: grammar – defines languages; follow certain

rules
INFORMAL EXAMPLE – language of palindrome

(string that reads the same forward and backward)
Example – OTTO, MADAMIMADAM, 0110, 11011 and 
 begins and ends with the same symbol

 when the first and last symbols are removed, the
resulting string is also a palindrome
BASIS: , 0 and 1 are palindromes.
Induction: If w is a palindrome, so are 0w0 and 1w1.
FORMAL DEFINITION OF CFG:
A context-free grammar is a quadruple

G = (V, T, P, S)
where
V is a finite set of variables.
T is a finite set of terminals.
0 and 1 are TERMINALS
P is a finite set of productions of the form
P is a VARIABLE (or non terminal) A 
P is in this grammar also the START SYMBOL. where A is a variable and   (V  T)*
S is a designated variable called the start
1-5 are PRODUCTIONS (or rules)
symbol.
COLEGIO DE LOS BANOS 5

CFG THAT REPRESENTS EXPRESSIONS:  Root – labeled by the start symbol
Limitations:
 Operators + and * (representing addition and

multiplication)
 Arguments (values passed) as Identifiers
 Two variables to use:
 I represents identifiers
 E represents expressions (combination of Parse tree for a palindrome
values and operators that will be evaluated) (Yield – 0110)
 Every identifier must begin with a or b, which

maybe followed by any string in {a, b. 0, 1}*
 Language:
(a + b)(a + b + 0 + 1)*
 Formal definition: G = ({E,I}, {a, b. 0, 1}, A, E}
where A: (production rules)
1. E -> E + E (E can be 2 expressions

connected by a + sign)
2. E -> E * E (E can be 2 expressions
connected by a * sign)
3. E -> (E) (parenthesized expression)
4. E -> I (basic rule for expressions – E
can be a single identifier)
Parse tree for a
5. I -> a (a is an identifier) regular expression
6. I -> b (b is an identifier) Yield – a*(a + b00)
7. I -> Ia (Identifier followed by a is
another identifier) NOTE:
8. I -> Ib (Identifier followed by is
another identifier b)
Why such grammars are called `context free'?
9. I -> I0 (Identifier followed by 0 is
another identifier) Because all rules contain only one symbol on the left
10. I -> I1 (Identifier followed by 1 is hand side --- and wherever we see that symbol while
another identifier) doing a derivation, we are free to replace it with the
stuff on the right hand side. That is, the `context' in
PARSE TREE: which a symbol on the left hand side of a rule occurs is
unimportant --- we can always use the rule to make the
 Alternative representation to derivations that
rewrite while doing a derivation.
tells about the syntactic structure of 
The language generated by a context free grammar is
 There can be several parse trees for the same
the set of terminal symbols that can be derived starting
string (called ambiguity)
from the start symbol.
 Yield – concatenation of the string of leaves
from left to right of the parse tree (terminal
string)

DERIVATIONS USING A GRAMMAR:
Productions of a CFG are used to infer that certain

strings are in the language of a certain variable.
Two approaches:
1. Recursive inference – use the production

rules from body to head
2. Derivation – use the production rules from

head to body
EXAMPLE OF RECURSIVE INFERENCE: CONSIDER THE GRAMMAR FOR THE LANGUAGE:

(Using Approach B)
a*(a + b00)
String Lang Prod String(s)

used
i a I 5 -
ii b I 6 -
Two Parse trees
iii b0 I 9 ii
iv b00 I 9 iii
v a E 4 i
vi b00 E 4 iv
vii a + b00 E 1 v, vi
viii (a + b00) E 3 vii
ix a * (a + b00) E 2 v, viii
YIELD – aabbccdd
Two leftmost derivations:
DERIVATION 
*For every parse tree, there is a unique leftmost and

rightmost derivation.
TWO TYPES:
Example A:

A language that generates strings of 0’s and 1’s such A. Recursive Inference:
that the strings starts with either 0 or 1 followed by
101. String Lang Prod String(s)
used
Language: (0 + 1) 101
i 0 I 5 -
Strings included: 0101, 1101
ii 1 I 6 -
Formal definition: G = ({E, I}, {0, 1}, A, E} iii 10 I 8 ii
where A: (production rules) iv 101 I 7 iii

v 0 E 4 i
vi 1 E 4 ii
connected by a + sign)
2. E -> E * E (E can be 2 expressions vii 101 E 4 iv
connected by a * sign) viii 0+1 E 1 v, vi
ix (0 + 1) E 3 viii
can be a single identifier) x (0 + 1)101 E 2 vii, ix
5. I -> 0 (a is an identifier)
6. I -> 1 (b is an identifier) B. Derivations:
another identifier) Leftmost: (Rule used)
another identifier b) E  E  E  (E) E  (E + E)  E  (I + E)  E 
(2) (3) (1) (4)
another identifier) (0+ E)  E  (0 + I)  E  (0 + 1)  E  (0 + 1)  I 
(5) (4) (6) (4)
PARSE TREE:
(0 + 1)  I1  (0 + 1)  I01  (0 + 1)  101 
E (7) (9) (6)
(0 + 1)101
E * E
( E ) Rightmost:
I E  E  E  E  I  E  I1  E  I01 
(2) (4) (7) (9)
E + E
I 1 E  101  (E)  101  (E + E)  101 (E + I) 101

(6) (3) (1) (4)
I
(E + 1) 101 (I + 1)  I01  (0 + 1)  101 
I (6) (4) (5)
0
I 0 (0 + 1)101
EXAMPLE B:
DETERMINE IF CERTAIN STRINGS ARE IN THE
LANGUAGE OF A CERTAIN VARIABLE. Given a context-free language with:

Formal definition: G = ({S, A, B}, {0, 1, }, A, S} Formal definition: G = ({S, A, B}, {a, b, }, A, S}
where A: (production rules) where A: (production rules)
1. S -> A1B 1. S -> ASB
2. A -> 0A |  2. S -> AB
3. A -> a
3. B -> 0B | 1B | 
4. B -> b
Parse Trees: Parse Tree:
S S
A 1 B A S B
S
0 A
A 0 B
a A 0
B
b
1 B
0 A
a b
 
Yield = aabb
DERIVATIONS: DERIVATIONS:
Leftmost: (Rule No.) Leftmost: (Rule No.)
S => A1B => 0A1B => 00A1B => 001B => 0010B S => ASB => aSB => aABB => aaBB => aabB
(2) (2) (2) (3) (3) (2) (3) (4)
=> 00101B => 00101 (yield) => aabb (yield)

(3)
Rightmost:
Rightmost: S => ASB => ASb => AABb => AAbb => Aabb
S => A1B => A10B => A101B => A101 => 0A101 (4) (2) (4) (3)
(3) (3) (3) (2)
=> aabb (yield)
=> 00A101 => 00101 (yield)
(2)
EXAMPLE C:
Example A:
Given a context-free language with:
Language: (g + h) + (10g + h)

Strings included: g, h, 10g

Formal definition: G = ({E, I}, {g, h}, A, E} used
where A: (production rules) i g I 5 -

ii h I 6 -
connected by a + sign) iii 1 I 7 -
2. E -> E * E (E can be 2 expressions iv 10 I 8 iii
v 10g I 9 iv
vi g E 4 i
can be a single identifier) vii h E 4 ii
5. I -> g (g is an identifier) viii 10g E 4 v
6. I -> h (h is an identifier)
ix g+h E 1 vi, vii
7. I -> 1 (1 is an identifier)
x (g + h) E 3 ix
another identifier) xi 10g + h E 1 viii + vii
9. I -> Ig (Identifier followed by g is xii (10g + h) E 3 xi
another identifier)
xiii (g + h) + E 3 x, xii
PARSE TREE: (10g + h)
E
LEFTMOST DERIVATION:
E + E
( E ) ( E )
E + E
E + E
I I
I I
NOTE:
g h I g
Why such grammars are called `context free'?
h
I 0 Because all rules contain only one symbol on the left hand side
--- and wherever we see that symbol while doing a derivation,
we are free to replace it with the stuff on the right hand side.
That is, the `context' in which a symbol on the left hand side
1
of a rule occurs is unimportant --- we can always use the rule
to make the rewrite while doing a derivation.
The language generated by a context free grammar is the set

of terminal symbols that can be derived starting from the start
symbol.
DETERMINE IF CERTAIN STRINGS ARE IN THE Example B:
LANGUAGE OF A CERTAIN VARIABLE.
Language: (g1h)( g0 + g1 + h)
C. Recursive Inference:
COLEGIO DE LOS BANOS

10
Strings included: g1hg0, g1hg1, g1hh

Formal definition: G = ({E, I}, {g, h}, A, E} used
where A: (production rules) i

ii
connected by a + sign) iii
2. E -> E * E (E can be 2 expressions iv
v
vi
can be a single identifier) vii
5. I -> g (g is an identifier) viii
6. I -> h (h is an identifier)
ix
another identifier) x
8. I -> Ih (Identifier followed by h is xi

another identifier)
xii
another identifier) xiii
PARSE TREE:
LEFTMOST DERIVATION:
E
E * E
( E )
I
E + E
I h
E + E
I 1
RIGHTMOST DERIVATION:
I
I
g h
I
I 0 I 1
g g
DETERMINE IF CERTAIN STRINGS ARE IN THE

LANGUAGE OF A CERTAIN VARIABLE.
D. Recursive Inference:
COLEGIO DE LOS BANOS

11

H5 Context Free Grammar

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

H5 Context Free Grammar

Uploaded by

Copyright:

Available Formats

CMSC 114– Context-free Grammar and Languages Basics

 played a central role in compiler technology since

 enhanced the implementation of parsers (functions

 been used to describe document formats especially

Note: grammar – defines languages; follow certain

INFORMAL EXAMPLE – language of palindrome

Example – OTTO, MADAMIMADAM, 0110, 11011 and 

 begins and ends with the same symbol

BASIS: , 0 and 1 are palindromes.

Induction: If w is a palindrome, so are 0w0 and 1w1.

FORMAL DEFINITION OF CFG:

A context-free grammar is a quadruple

COLEGIO DE LOS BANOS 5

CFG THAT REPRESENTS EXPRESSIONS:  Root – labeled by the start symbol

 Operators + and * (representing addition and

 Arguments (values passed) as Identifiers

 Two variables to use:

 Every identifier must begin with a or b, which

 Formal definition: G = ({E,I}, {a, b. 0, 1}, A, E}

where A: (production rules)

1. E -> E + E (E can be 2 expressions

COLEGIO DE LOS BANOS 6

DERIVATIONS USING A GRAMMAR:

Productions of a CFG are used to infer that certain

1. Recursive inference – use the production

2. Derivation – use the production rules from

EXAMPLE OF RECURSIVE INFERENCE: CONSIDER THE GRAMMAR FOR THE LANGUAGE:

String Lang Prod String(s)

Two leftmost derivations:

*For every parse tree, there is a unique leftmost and

COLEGIO DE LOS BANOS 7

where A: (production rules) iv 101 I 7 iii

I 1 E  101  (E)  101  (E + E)  101 (E + I) 101

COLEGIO DE LOS BANOS 8

where A: (production rules) where A: (production rules)

1. S -> A1B 1. S -> ASB

Parse Trees: Parse Tree:

Leftmost: (Rule No.) Leftmost: (Rule No.)

=> 00101B => 00101 (yield) => aabb (yield)

COLEGIO DE LOS BANOS 9

Strings included: g, h, 10g

where A: (production rules) i g I 5 -

The language generated by a context free grammar is the set

COLEGIO DE LOS BANOS

Strings included: g1hg0, g1hg1, g1hh

where A: (production rules) i

8. I -> Ih (Identifier followed by h is xi

DETERMINE IF CERTAIN STRINGS ARE IN THE

COLEGIO DE LOS BANOS

You might also like