Professional Documents
Culture Documents
Grammar
Grammar
Languages and
Grammars
Alphabets and Languages
• Grammar
– Rules for defining which strings over an
alphabet are in a particular language
• Automaton (plural is automata)
– A mathematical model of a computer which can
determine whether a particular string is in the
language
Definition of a Grammar
• Every language generated by a left linear grammar can be generated by a right linear
grammar, and every language generated by a right linear grammar can be generated by
a left linear grammar.
• Every language generated by a left linear or right linear grammar can be generated by a
linear grammar.
• Every language generated by a linear grammar can be generated by a context free
grammar.
• Let L be a language generated by a context free grammar. If L does not contain λ, then
L can be generated by a context sensitive grammar. If L contains λ, then L-{λ} can be
generated by a context sensitive grammar.
• Every language generated by a context sensitive grammar can be generated by a phrase
structure grammar.
Example : A Left Linear Grammar for Identifiers
• S -> Sa • S => a
• S -> Sb
• S -> S1 • S => S 1 => a 1
• S -> S2
• S -> a • S => S 2 => S b 2
• S -> b => S 1 b 2 => a 1 b 2
Example : A Right Linear Grammar for Identifiers
• S -> a S b S => a S b
• S -> a b => a a S b b
=> a a a S b b b
=> a a a a b b b b
Example : A Linear Grammar for {an bm cm dn | n > 0}
(Context Free Grammar)
• S -> a S b S => a S b
• S -> a T b => a a S b b
• T -> c T d => a a a T b b b
• T -> c d => a a a c T d b b b
=> a a a c c d d b b b
Another Example :
A Context Free Grammar for {an bn cm dm | n > 0}
• S -> R T S => R T
• R -> a R b => a R b T
• R -> a b => a a R b b T
• T -> c T d => a a a b b b T
• T -> c d => a a a b b b c T d
=> a a a b b b c c d d
A Context Free Grammar for Expressions
• S -> a S B C S => a S B C
• S -> a B C => a a B C B C
• a B -> a b => a a b C B C
• b B -> b b => a a b B C C
• C B -> B C => a a b b C C
• b C -> b c => a a b b c C
• c C -> c c => a a b b c c
The Chomsky Hierarchy and the Block Diagram of a
Compiler
Type 3 Type 2
Int.
Source tokens tree Inter- Object
code
language Scanner mediate Code language
Parser Code
Optimizer
program Generator program
Generator
Symbol Table
Automata
r1 + r2
r1 ⋅ r2
Are regular expressions
r1 *
(r1 )
Example (a + b ) ⋅ a *
2n 2m
L(r ) = {a b b : n, m ≥ 0}
Example
Regular expression r = (0 + 1) * 00 (0 + 1) *
• Definition:
r1 and r2
L(r1 ) = L(r2 ) = L
are equivalent
regular expr.
Linear Grammars
Grammars with
at most one variable at the right side
of a production
S → aSb S → Ab
Examples:
S →λ A → aAb
A→λ
A Non-Linear Grammar
Grammar G: S → SS
S →λ
S → aSb
S → bSa
L(G ) = {w : na ( w) = nb ( w)}
Number of a in string w
Another Linear Grammar
S→A
Grammar G: A → aB | λ
B → Ab
n n
L(G ) = {a b : n ≥ 0}
Right-Linear Grammars
• Example:
S → abS string of
S →a terminals
Left-Linear Grammars
Example:
S → Aab string of
A → Aab | B terminals
B→a
Regular Grammars
Examples: G1 G2
S → abS S → Aab
S →a A → Aab | B
B→a
Observation
•
Context-Free Languages
n n R
{a b } {ww }
Regular Languages
Context-Free Languages
Context-Free Pushdown
Grammars Automata
stack
automaton
Example
A context-free grammar G: S → aSb
S →λ
A derivation:
n n
L(G ) = {a b : n ≥ 0}
A derivation:
R
L(G ) = {ww : w ∈{a, b}*}
Example
A derivation:
S ⇒ SS ⇒ aSbS ⇒ abS ⇒ ab
A context-free grammar G: S → aSb
S → SS
S →λ
A derivation:
context-free grammar G
with L = L(G )
Derivation Order
1. S → AB 2. A → aaA 4. B → Bb
3. A → λ 5. B → λ
Leftmost derivation:
1 2 3 4 5
S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab
Rightmost derivation:
1 4 5 2 3
S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab
S → aAB
A → bBb
B → A|λ
Leftmost derivation:
S ⇒ aAB ⇒ abBbB ⇒ abAbB ⇒ abbBbbB
⇒ abbbbB ⇒ abbbb
Rightmost derivation:
S ⇒ aAB ⇒ aA ⇒ abBb ⇒ abAb
⇒ abbBbb ⇒ abbbb
S → AB A → aaA | λ B → Bb | λ
S ⇒ AB ⇒ aaAB ⇒ aaABb
S
A B
a a A B b
S → AB A → aaA | λ B → Bb | λ
A B
a a A B b
λ
S → AB A → aaA | λ B → Bb | λ
A B
a a A B b
λ λ
S → AB A → aaA | λ B → Bb | λ
A B
yield
a a A B b aaλλb
= aab
λ λ
Sometimes, derivation order doesn’t matter
Leftmost:
S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab
Rightmost:
S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab
S
Same derivation tree
A B
a a A B b
λ λ
E → E + E | E ∗ E | (E) | a
a + a∗a
E E ⇒ E + E ⇒ a+ E ⇒ a+ E∗E
⇒ a + a∗ E ⇒ a + a*a
E + E
leftmost derivation
a E ∗ E
a a
E → E + E | E ∗ E | (E) | a
a + a∗a
E + E a
a a
E → E + E | E ∗ E | (E) | a
a + a∗a
Two derivation trees
E E
E + E E ∗ E
a E ∗ E E + E a
a a a a
The grammarE → E + E | E ∗ E | (E) | a
is ambiguous:
E E
E + E E ∗ E
a E ∗ E E + E a
a a a a
Definition:
A context-free grammar G is ambiguous
2 2 2 2
• Ambiguity is bad for programming languages
E a + a∗a
E + T
T T ∗ F
F F a
a a
Copyright © 2006 Addison-Wesley. All rights reserved. 1-67
The grammar G: E → E +T
E →T
T →T ∗F
T →F
F → (E)
F →a
is non-ambiguous:
Every string w∈ L(G ) has
a unique derivation tree
Another Ambiguous Grammar
IF_STMT
n n m n m m
Example: L = {a b c } ∪ {a b c }
S → S1 | S 2 S1 → S1c | A S 2 → aS2 | B
A → aAb | λ B → bBc | λ
Copyright © 2006 Addison-Wesley. All rights reserved. 1-71
n n n
The string a b c
has two derivation trees
S S
S1 S2
S1 c a S2
Simplification of Context Free Grammar
A Substitution Rule
Equivalent
grammar
S → aB
S → aB | ab
A → aaA
Substitute A → aaA
A → abBc B→b A → abBc | abbc
B → aA
B → aA
B→b
Copyright © 2006 Addison-Wesley. All rights reserved. 1-74
A Substitution Rule
S → aB | ab
A → aaA
A → abBc | abbc
B → aA
Substitute
B → aA
S → aB | ab | aaA
Equivalent
A → aaA
A → abBc | abbc | abaAc
grammar
In general:
A → xBz
B → y1
Substitute
B → y1
equivalent
A → xBz | xy1z grammar
Nullable Variables
λ − production : A→λ
Example Grammar:
S → aMb
M → aMb
M →λ
Nullable variable
S → aMb
S → aMb
Substitute S → ab
M → aMb M →λ
M → aMb
M →λ
M → ab
Unit Production: A→ B
Observation:
A→ A
Is removed immediately
S → aA
A→a
A→ B
B→A
B → bb
Final grammar
S → aA | aB | aA S → aA | aB
A→a A→a
B → bb B → bb
S ⇒ A ⇒ aA ⇒ aaA ⇒ K ⇒ aa K aA ⇒ K
Copyright © 2006 Addison-Wesley. All rights reserved. 1-87
Another grammar:
S→A
A → aA
A→λ
B → bA Useless Production
Not reachable from S
w ∈ L(G )
S → aSb
S →λ Productions
Variables S→A useless
useless A → aA useless
useless B→C useless
S → aS | A | C
A→a
B → aa
C → aCb
S → aS | A | C Round 1: { A, B}
A→a S→A
B → aa
C → aCb Round 2: { A, B, S }
Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)
S → aS | A | C
A→a S → aS | A
B → aa A→a
C → aCb B → aa
Remove useless productions
Second: Find all variables
reachable from S
S → aS | A
A→a S A B
B → aa not
reachable
Keep only the variables
reachable from S
(the rest variables are useless)
Final Grammar
S → aS | A
S → aS | A
A→a
A→a
B → aa
A → BC or A→a
S → AS S → AS
S →a S → AAS
A → SA A → SA
A→b A → aa
Chomsky Not Chomsky
Normal Form Normal Form
S → ABa
• Example:
A → aab
B → Ac
Not Chomsky
Normal Form
S → ABTa
S → ABa A → TaTaTb
A → aab B → ATc
B → Ac Ta → a
Tb → b
Tc → c
Introduce intermediate variable: V1
S → AV1
S → ABTa
V1 → BTa
A → TaTaTb
A → TaTaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Tc → c
Introduce intermediate variable: V2
S → AV1
S → AV1
V1 → BTa
V1 → BTa
A → TaV2
A → TaTaTb
V2 → TaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Tc → c
Final grammar in Chomsky Normal Form:
S → AV1
V1 → BTa
A → TaV2
Initial grammar
V2 → TaTb
S → ABa B → ATc
A → aab Ta → a
B → Ac Tb → b
Tc → c
In general:
we can obtain:
An equivalent grammar
in Chomsky Normal Form
First remove:
Nullable variables
Unit productions
Add production Ta → a
New variable: Ta
Replace any production A → C1C2 LCn
with A → C1V1
V1 → C2V2
K
Vn−2 → Cn−1Cn
A → a V1V2 LVk k ≥0
symbol variables
S → cAB
S → abSb
A → aA | bB | b
S → aa
B→b
Context-free languages
are closed under: Union
L1 is context free
L1 ∪ L2
L2 is context free is context-free
n n
L1 = {a b } S1 → aS1b | λ
R
L2 = {ww } S 2 → aS2 a | bS 2b | λ
Union
n n R S → S1 | S 2
L = {a b } ∪ {ww }
In general:
For context-free languages L1, L2
with context-free grammars G1, G2
and start variables S1, S 2
Context-free languages
are closed under: Concatenation
L1 is context free
L1L2
L2 is context free is context-free
n n
L1 = {a b } S1 → aS1b | λ
R
L2 = {ww } S 2 → aS2 a | bS 2b | λ
Concatenation
n n R S → S1S 2
L = {a b }{ww }
In general:
Context-free languages
are closed under: Star-operation
L is context free *
L is context-free
Language Grammar
n n
L = {a b } S → aSb | λ
Star Operation
n n
L = {a b } * S1 → SS1 | λ
Copyright © 2006 Addison-Wesley. All rights reserved. 1-120
In general:
Context-free languages
are not closed under: intersection
L1 is context free
L1 ∩ L2
L2 is context free not necessarily
context-free
Copyright © 2006 Addison-Wesley. All rights reserved. 1-122
Example
n n m n m m
L1 = {a b c } L2 = {a b c }
Context-free: Context-free:
S → AC S → AB
A → aAb | λ A → aA | λ
C → cC | λ B → bBc | λ
Intersection
n n n
L1 ∩ L2 = {a b c } NOT context-free
Complement
Context-free languages
are not closed under: complement
L1 context free
L1 ∩ L2
L2 regular context-free
Summary
Non-recursively enumerable
Recursively-enumerable
Recursive
Context-sensitive
Context-free
Regular
Who is Noam Chomsky Anyway?
• Philosopher of Languages
• Professor of Linguistics at MIT
• Constructed the idea that language was not
a learned “behavior”, but that it was
cognitive and innate; versus stimulus-
response driven
• In an effort to explain these theories, he
developed the Chomsky Hierarchy
Chomsky Hierarchy