H5 Context Free Grammar

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

CMSC 114– Context-free Grammar and Languages Basics

The first three rules (1-3) forms the BASIS. They tell us
Many languages cannot be REGULAR. Thus we need to that the class of palindromes includes the strings , 0
consider larger classes of languages called “Context- and 1. None of the right sides of these rules (the
Free Languages (CFL's)” portions following the arrows) contains a variable,
which is why they form a basis for the definition.
CONTEXT-FREE GRAMMAR (CFG) – another notation
for describing languages; formal notation for expressing The last two rules (4-5) form the INDUCTIVE part of the
recursive definition definition. For instance, rule 4 says that if we take any
string from the class P, then 00 is also in class P.
The natural, recursive notations of CFL have:
Rule 5 likewise tells us that 11 is also in P.

 played a central role in compiler technology since


the 1960's.

 enhanced the implementation of parsers (functions


that discover the structure of a program)

 been used to describe document formats especially


for information exchange in the WWW

Note: grammar – defines languages; follow certain


rules

INFORMAL EXAMPLE – language of palindrome


(string that reads the same forward and backward)

Example – OTTO, MADAMIMADAM, 0110, 11011 and 

 begins and ends with the same symbol


 when the first and last symbols are removed, the
resulting string is also a palindrome

BASIS: , 0 and 1 are palindromes.

Induction: If w is a palindrome, so are 0w0 and 1w1.

FORMAL DEFINITION OF CFG:

A context-free grammar is a quadruple


G = (V, T, P, S)
where
V is a finite set of variables.
T is a finite set of terminals.
0 and 1 are TERMINALS
P is a finite set of productions of the form
P is a VARIABLE (or non terminal) A 

P is in this grammar also the START SYMBOL. where A is a variable and   (V  T)*
S is a designated variable called the start
1-5 are PRODUCTIONS (or rules)
symbol.

COLEGIO DE LOS BANOS 5


CMSC 114– Context-free Grammar and Languages Basics

CFG THAT REPRESENTS EXPRESSIONS:  Root – labeled by the start symbol

Limitations:

 Operators + and * (representing addition and


multiplication)

 Arguments (values passed) as Identifiers

 Two variables to use:

 I represents identifiers
 E represents expressions (combination of Parse tree for a palindrome
values and operators that will be evaluated) (Yield – 0110)

 Every identifier must begin with a or b, which


maybe followed by any string in {a, b. 0, 1}*

 Language:

(a + b)(a + b + 0 + 1)*

 Formal definition: G = ({E,I}, {a, b. 0, 1}, A, E}

where A: (production rules)

1. E -> E + E (E can be 2 expressions


connected by a + sign)
2. E -> E * E (E can be 2 expressions
connected by a * sign)
3. E -> (E) (parenthesized expression)
4. E -> I (basic rule for expressions – E
can be a single identifier)
Parse tree for a
5. I -> a (a is an identifier) regular expression
6. I -> b (b is an identifier) Yield – a*(a + b00)
7. I -> Ia (Identifier followed by a is
another identifier) NOTE:
8. I -> Ib (Identifier followed by is
another identifier b)
Why such grammars are called `context free'?
9. I -> I0 (Identifier followed by 0 is
another identifier) Because all rules contain only one symbol on the left
10. I -> I1 (Identifier followed by 1 is hand side --- and wherever we see that symbol while
another identifier) doing a derivation, we are free to replace it with the
stuff on the right hand side. That is, the `context' in
PARSE TREE: which a symbol on the left hand side of a rule occurs is
unimportant --- we can always use the rule to make the
 Alternative representation to derivations that
rewrite while doing a derivation.
tells about the syntactic structure of 
The language generated by a context free grammar is
 There can be several parse trees for the same
the set of terminal symbols that can be derived starting
string (called ambiguity)
from the start symbol.
 Yield – concatenation of the string of leaves
from left to right of the parse tree (terminal
string)

COLEGIO DE LOS BANOS 6


CMSC 114– Context-free Grammar and Languages Basics

DERIVATIONS USING A GRAMMAR:

Productions of a CFG are used to infer that certain


strings are in the language of a certain variable.

Two approaches:

1. Recursive inference – use the production


rules from body to head

2. Derivation – use the production rules from


head to body

EXAMPLE OF RECURSIVE INFERENCE: CONSIDER THE GRAMMAR FOR THE LANGUAGE:


(Using Approach B)

a*(a + b00)

String Lang Prod String(s)


used
i a I 5 -
ii b I 6 -
Two Parse trees
iii b0 I 9 ii
iv b00 I 9 iii
v a E 4 i
vi b00 E 4 iv
vii a + b00 E 1 v, vi
viii (a + b00) E 3 vii
ix a * (a + b00) E 2 v, viii

YIELD – aabbccdd

Two leftmost derivations:

DERIVATION 

*For every parse tree, there is a unique leftmost and


rightmost derivation.

TWO TYPES:

Example A:

COLEGIO DE LOS BANOS 7


CMSC 114– Context-free Grammar and Languages Basics

A language that generates strings of 0’s and 1’s such A. Recursive Inference:
that the strings starts with either 0 or 1 followed by
101. String Lang Prod String(s)
used
Language: (0 + 1) 101
i 0 I 5 -
Strings included: 0101, 1101
ii 1 I 6 -
Formal definition: G = ({E, I}, {0, 1}, A, E} iii 10 I 8 ii

where A: (production rules) iv 101 I 7 iii


v 0 E 4 i
1. E -> E + E (E can be 2 expressions
vi 1 E 4 ii
connected by a + sign)
2. E -> E * E (E can be 2 expressions vii 101 E 4 iv
connected by a * sign) viii 0+1 E 1 v, vi
3. E -> (E) (parenthesized expression)
ix (0 + 1) E 3 viii
4. E -> I (basic rule for expressions – E
can be a single identifier) x (0 + 1)101 E 2 vii, ix
5. I -> 0 (a is an identifier)
6. I -> 1 (b is an identifier) B. Derivations:
7. I -> I1 (Identifier followed by 1 is
another identifier) Leftmost: (Rule used)
8. I -> I0 (Identifier followed by 0 is
another identifier b) E  E  E  (E) E  (E + E)  E  (I + E)  E 
(2) (3) (1) (4)
9. I -> I01 (Identifier followed by 01 is
another identifier) (0+ E)  E  (0 + I)  E  (0 + 1)  E  (0 + 1)  I 
(5) (4) (6) (4)
PARSE TREE:
(0 + 1)  I1  (0 + 1)  I01  (0 + 1)  101 
E (7) (9) (6)

(0 + 1)101
E * E

( E ) Rightmost:

I E  E  E  E  I  E  I1  E  I01 
(2) (4) (7) (9)
E + E

I 1 E  101  (E)  101  (E + E)  101 (E + I) 101


(6) (3) (1) (4)
I
(E + 1) 101 (I + 1)  I01  (0 + 1)  101 
I (6) (4) (5)
0
I 0 (0 + 1)101

EXAMPLE B:
DETERMINE IF CERTAIN STRINGS ARE IN THE
LANGUAGE OF A CERTAIN VARIABLE. Given a context-free language with:

COLEGIO DE LOS BANOS 8


CMSC 114– Context-free Grammar and Languages Basics

Formal definition: G = ({S, A, B}, {0, 1, }, A, S} Formal definition: G = ({S, A, B}, {a, b, }, A, S}

where A: (production rules) where A: (production rules)

1. S -> A1B 1. S -> ASB

2. A -> 0A |  2. S -> AB
3. A -> a
3. B -> 0B | 1B | 
4. B -> b

Parse Trees: Parse Tree:

S S

A 1 B A S B
S

0 A
A 0 B
a A 0
B
b
1 B
0 A
a b
 
Yield = aabb

DERIVATIONS: DERIVATIONS:

Leftmost: (Rule No.) Leftmost: (Rule No.)

S => A1B => 0A1B => 00A1B => 001B => 0010B S => ASB => aSB => aABB => aaBB => aabB
(2) (2) (2) (3) (3) (2) (3) (4)

=> 00101B => 00101 (yield) => aabb (yield)


(3)

Rightmost:
Rightmost: S => ASB => ASb => AABb => AAbb => Aabb
S => A1B => A10B => A101B => A101 => 0A101 (4) (2) (4) (3)
(3) (3) (3) (2)
=> aabb (yield)
=> 00A101 => 00101 (yield)
(2)

EXAMPLE C:
Example A:
Given a context-free language with:
Language: (g + h) + (10g + h)

COLEGIO DE LOS BANOS 9


CMSC 114– Context-free Grammar and Languages Basics

Strings included: g, h, 10g


String Lang Prod String(s)
Formal definition: G = ({E, I}, {g, h}, A, E} used

where A: (production rules) i g I 5 -


ii h I 6 -
1. E -> E + E (E can be 2 expressions
connected by a + sign) iii 1 I 7 -
2. E -> E * E (E can be 2 expressions iv 10 I 8 iii
connected by a * sign)
v 10g I 9 iv
3. E -> (E) (parenthesized expression)
vi g E 4 i
4. E -> I (basic rule for expressions – E
can be a single identifier) vii h E 4 ii
5. I -> g (g is an identifier) viii 10g E 4 v
6. I -> h (h is an identifier)
ix g+h E 1 vi, vii
7. I -> 1 (1 is an identifier)
x (g + h) E 3 ix
8. I -> I0 (Identifier followed by 0 is
another identifier) xi 10g + h E 1 viii + vii
9. I -> Ig (Identifier followed by g is xii (10g + h) E 3 xi
another identifier)
xiii (g + h) + E 3 x, xii
PARSE TREE: (10g + h)

E
LEFTMOST DERIVATION:

E + E

( E ) ( E )

E + E
E + E

I I

I I
NOTE:
g h I g
Why such grammars are called `context free'?
h
I 0 Because all rules contain only one symbol on the left hand side
--- and wherever we see that symbol while doing a derivation,
we are free to replace it with the stuff on the right hand side.
That is, the `context' in which a symbol on the left hand side
1
of a rule occurs is unimportant --- we can always use the rule
to make the rewrite while doing a derivation.

The language generated by a context free grammar is the set


of terminal symbols that can be derived starting from the start
symbol.
DETERMINE IF CERTAIN STRINGS ARE IN THE Example B:
LANGUAGE OF A CERTAIN VARIABLE.
Language: (g1h)( g0 + g1 + h)
C. Recursive Inference:

COLEGIO DE LOS BANOS


10
CMSC 114– Context-free Grammar and Languages Basics

Strings included: g1hg0, g1hg1, g1hh


String Lang Prod String(s)
Formal definition: G = ({E, I}, {g, h}, A, E} used

where A: (production rules) i


ii
1. E -> E + E (E can be 2 expressions
connected by a + sign) iii
2. E -> E * E (E can be 2 expressions iv
connected by a * sign)
v
3. E -> (E) (parenthesized expression)
vi
4. E -> I (basic rule for expressions – E
can be a single identifier) vii
5. I -> g (g is an identifier) viii
6. I -> h (h is an identifier)
ix
7. I -> I1 (Identifier followed by 1 is
another identifier) x

8. I -> Ih (Identifier followed by h is xi


another identifier)
xii
9. I -> I0 (Identifier followed by 0 is
another identifier) xiii

PARSE TREE:
LEFTMOST DERIVATION:
E

E * E

( E )

I
E + E
I h

E + E
I 1
RIGHTMOST DERIVATION:
I
I
g h
I

I 0 I 1

g g

DETERMINE IF CERTAIN STRINGS ARE IN THE


LANGUAGE OF A CERTAIN VARIABLE.

D. Recursive Inference:

COLEGIO DE LOS BANOS


11

You might also like