Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Derivation Trees

43
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB
S

A B

44
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB
S

A B

a a A

45
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb
S

A B

a a A B b

46
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb


S

A B

a a A B b

λ
47
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb ⇒ aab


Derivation Tree S

A B

a a A B b

λ λ
48
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb ⇒ aab


Derivation Tree S

A B
yield

a a A B b aaλλb
= aab
λ λ
49
Ambiguity

50
E → E + E | E ∗ E | (E) | a
a + a∗a

E E ⇒ E + E ⇒ a+ E ⇒ a+ E∗E
⇒ a + a∗ E ⇒ a + a*a
E + E
leftmost derivation

a E ∗ E

a a
51
E → E + E | E ∗ E | (E) | a
a + a∗a

E ⇒ E∗E ⇒ E + E∗E ⇒ a+ E∗E E


⇒ a + a∗E ⇒ a + a∗a
E ∗ E
leftmost derivation

E + E a

a a
52
E → E + E | E ∗ E | (E) | a
a + a∗a
Two derivation trees
E E

E + E E ∗ E

a E ∗ E E + E a

a a a a
53
The grammarE → E + E | E ∗ E | (E) | a
is ambiguous:

string a + a ∗ a has two derivation trees

E E

E + E E ∗ E

a E ∗ E E + E a

a a a a
54
The grammarE → E + E | E ∗ E | (E) | a
is ambiguous:

string a + a ∗ a has two leftmost derivations

E ⇒ E + E ⇒ a+ E ⇒ a+ E∗E
⇒ a + a∗ E ⇒ a + a*a

E ⇒ E∗E ⇒ E + E∗E ⇒ a+ E∗E


⇒ a + a∗E ⇒ a + a∗a 55
Definition:
A context-free grammar G is ambiguous

if some string w∈ L(G ) has:

two or more derivation trees

56
In other words:

A context-free grammar G is ambiguous

if some string w∈ L(G ) has:

two or more leftmost derivations


(or rightmost)

57
Why do we care about ambiguity?

a + a∗a
take a=2
E E

E + E E ∗ E

a E ∗ E E + E a

a a a a
58
2 + 2∗2

E E

E + E E ∗ E

2 E ∗ E E + E 2

2 2 2 2
59
2 + 2∗2 = 6 2 + 2∗2 = 8
6 8
E E
2 4 4 2
E + E E ∗ E
2 2 2 2
2 E ∗ E E + E 2

2 2 2 2
60
Correct result: 2 + 2∗2 = 6

6
E
2 4
E + E
2 2
2 E ∗ E

2 2
61
• Ambiguity is bad for programming languages

• We want to remove ambiguity

62
Another Ambiguous Grammar

IF_STMT → if EXPR then STMT


| if EXPR then STMT else STMT

63
If expr1 then if expr2 then stmt1 else stmt2
IF_STMT

if expr1 then STMT

if expr2 then stmt1 else stmt2

IF_STMT

if expr1 then STMT else stmt2

if expr2 then stmt1


64
Inherent Ambiguity

Some context free languages


have only ambiguous grammars

Example: L = {a b c } ∪ {a b c }
n n m n m m

S → S1 | S 2 S1 → S1c | A S 2 → aS 2 | B
A → aAb | λ B → bBc | λ
65
n n n
The string a b c
has two derivation trees

S S

S1 S2

S1 c a S2

66
Simplifications
of
Context-Free Grammars

67
A Substitution Rule

Equivalent
grammar
S → aB
S → aB | ab
A → aaA
Substitute A → aaA
A → abBc B→b A → abBc | abbc
B → aA
B → aA
B→b
68
A Substitution Rule
S → aB | ab
A → aaA
A → abBc | abbc
B → aA
Substitute
B → aA
S → aB | ab | aaA
Equivalent
A → aaA
A → abBc | abbc | abaAc
grammar
69
In general:
A → xBz

B → y1

Substitute
B → y1

equivalent
A → xBz | xy1z grammar
70
Nullable Variables

λ − production : A→λ

Nullable Variable: A ⇒K⇒ λ

71
Removing Nullable Variables

Example Grammar:

S → aMb
M → aMb
M →λ

Nullable variable

72
Final Grammar

S → aMb
S → aMb
Substitute S → ab
M → aMb M →λ
M → aMb
M →λ
M → ab

73
Unit-Productions

Unit Production: A→ B

(a single variable in both sides)

74
Removing Unit Productions

Observation:

A→ A

Is removed immediately

75
Example Grammar:

S → aA
A→a
A→ B
B→A
B → bb

76
S → aA
S → aA | aB
A→a
Substitute A→a
A→ B A→ B B → A| B
B→A
B → bb
B → bb

77
S → aA | aB S → aA | aB
A→a Remove A→a
B → A| B B→B B→A
B → bb B → bb

78
S → aA | aB
S → aA | aB | aA
A→a Substitute
B→A A→a
B→A
B → bb
B → bb

79
Remove repeated productions

Final grammar
S → aA | aB | aA S → aA | aB
A→a A→a
B → bb B → bb

80
Useless Productions

S → aSb
S →λ
S→A
A → aA Useless Production

Some derivations never terminate...

S ⇒ A ⇒ aA ⇒ aaA ⇒ K ⇒ aa K aA ⇒ K
81
Another grammar:

S→A
A → aA
A→λ
B → bA Useless Production

Not reachable from S

82
In general: contains only
terminals
if S ⇒ K ⇒ xAy ⇒ K ⇒ w

w∈ L(G )

then variable A is useful

otherwise, variable A is useless

83
A production A → x is useless
if any of its variables is useless

S → aSb
S →λ Productions
Variables S→A useless

useless A → aA useless
useless B→C useless

useless C→D useless


84
Removing Useless Productions

Example Grammar:

S → aS | A | C
A→a
B → aa
C → aCb

85
First: find all variables that can produce
strings with only terminals

S → aS | A | C Round 1: { A, B}
A→a S→A
B → aa
C → aCb Round 2: { A, B, S }

86
Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)

S → aS | A | C
A→a S → aS | A
B → aa A→a
C → aCb B → aa
Remove useless productions
87
Second: Find all variables
reachable from S

Use a Dependency Graph

S → aS | A
A→a S A B
B → aa not
reachable

88
Keep only the variables
reachable from S
(the rest variables are useless)

Final Grammar
S → aS | A
S → aS | A
A→a
A→a
B → aa

Remove useless productions

89
Removing All

Step 1: Remove Nullable Variables

Step 2: Remove Unit-Productions

Step 3: Remove Useless Variables

90
Normal Forms
for
Context-free Grammars

91
Chomsky Normal Form

Each productions has form:

A → BC or A→a

variable variable terminal

92
Examples:

S → AS S → AS
S →a S → AAS
A → SA A → SA
A→b A → aa
Chomsky Not Chomsky
Normal Form Normal Form

93
Convertion to Chomsky Normal Form

Example: S → ABa
A → aab
B → Ac

Not Chomsky
Normal Form

94
Introduce variables for terminals: Ta , Tb , Tc

S → ABTa
S → ABa A → TaTaTb
A → aab B → ATc
B → Ac Ta → a
Tb → b
Tc → c
95
Introduce intermediate variable: V1

S → AV1
S → ABTa
V1 → BTa
A → TaTaTb
A → TaTaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Tc → c
96
Introduce intermediate variable: V2
S → AV1
S → AV1
V1 → BTa
V1 → BTa
A → TaV2
A → TaTaTb
V2 → TaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Tc → c 97
Final grammar in Chomsky Normal Form:
S → AV1
V1 → BTa
A → TaV2
Initial grammar
V2 → TaTb
S → ABa B → ATc
A → aab Ta → a
B → Ac Tb → b
Tc → c 98
In general:

From any context-free grammar


(which doesn’t produce λ )
not in Chomsky Normal Form

we can obtain:
An equivalent grammar
in Chomsky Normal Form

99
The Procedure

First remove:

Nullable variables

Unit productions

100
Then, for every symbol a:

Add production Ta → a

In productions: replace a with Ta

New variable: Ta
101
Replace any production A → C1C2 LCn

with A → C1V1
V1 → C2V2
K
Vn−2 → Cn−1Cn

New intermediate variables: V1, V2 , K,Vn−2


102
Theorem: For any context-free grammar
(which doesn’t produce λ )
there is an equivalent grammar
in Chomsky Normal Form

103
Observations

• Chomsky normal forms are good


for parsing and proving theorems

• It is very easy to find the Chomsky normal


form for any context-free grammar

104
Greinbach Normal Form

All productions have form:

A → a V1V2 LVk k ≥0

symbol variables

105
Observations

• Greinbach normal forms are very good


for parsing

• It is hard to find the Greinbach normal


form of any context-free grammar

106
Compilers

107
Machine Code
Program Add v,v,0
v = 5; cmp v,5
if (v>5) jmplt ELSE
x = 12 + v; THEN:
while (x !=3) { Compiler
add x, 12,v
x = x - 3; ELSE:
v = 10; WHILE:
} cmp x,3
...... ...
108
Compiler

Lexical
parser
analyzer

input output

machine
program
code
109
A parser knows the grammar
of the programming language

110
Parser
PROGRAM → STMT_LIST
STMT_LIST → STMT; STMT_LIST | STMT;
STMT → EXPR | IF_STMT | WHILE_STMT
| { STMT_LIST }

EXPR → EXPR + EXPR | EXPR - EXPR | ID


IF_STMT → if (EXPR) then STMT
| if (EXPR) then STMT else STMT
WHILE_STMT→ while (EXPR) do STMT

111
The parser finds the derivation
of a particular input

derivation
Parser
input E => E + E
E -> E + E
=> E + E * E
10 + 2 * 5 |E*E
=> 10 + E*E
| INT
=> 10 + 2 * E
=> 10 + 2 * 5

112
derivation tree
derivation
E

E => E + E E + E
=> E + E * E
=> 10 + E*E 10
E * E
=> 10 + 2 * E
=> 10 + 2 * 5 2 5

113
derivation tree

E machine code

E + E mult a, 2, 5
add b, 10, a
10
E * E

2 5

114
Parsing

115
Parser
input
grammar derivation
string

116
Example:

Parser
S → SS derivation
input
S → aSb
aabb ?
S → bSa
S →λ

117
Exhaustive Search

S → SS | aSb | bSa | λ

Phase 1: S ⇒ SS Find derivation of


S ⇒ aSb aabb
S ⇒ bSa
S ⇒λ
All possible derivations of length 1
118
S ⇒ SS aabb
S ⇒ aSb
S ⇒ bSa
S ⇒λ

119
Phase 2 S → SS | aSb | bSa | λ
S ⇒ SS ⇒ SSS
S ⇒ SS ⇒ aSbS aabb
Phase 1 S ⇒ SS ⇒ bSaS
S ⇒ SS S ⇒ SS ⇒ S
S ⇒ aSb S ⇒ aSb ⇒ aSSb
S ⇒ aSb ⇒ aaSbb
S ⇒ aSb ⇒ abSab
S ⇒ aSb ⇒ ab 120
S → SS | aSb | bSa | λ
Phase 2
S ⇒ SS ⇒ SSS
S ⇒ SS ⇒ aSbS aabb
S ⇒ SS ⇒ S

S ⇒ aSb ⇒ aSSb
S ⇒ aSb ⇒ aaSbb
Phase 3
S ⇒ aSb ⇒ aaSbb ⇒ aabb
121
Final result of exhaustive search
(top-down parsing)
Parser
S → SS
input
S → aSb
aabb
S → bSa
S →λ
derivation

S ⇒ aSb ⇒ aaSbb ⇒ aabb


122
Time complexity of exhaustive search

Suppose there are no productions of the form

A→λ
A→ B
Number of phases for string w : approx. |w|

123
For grammar with k rules

Time for phase 1: k

k possible derivations

124
Time for phase 2: k 2

k 2 possible derivations

125
Time for phase |w| is 2|w|:

A total of 2|w| possible derivations

126
Total time needed for string w:

k + k +L+ k
2 | w|

phase 1 phase 2 phase |w|

Extremely bad!!!
127
For general context-free grammars:

There exists a parsing algorithm


that parses a string | w |
in time | w |3

The CYK parser

128

You might also like