Download as pdf or txt
Download as pdf or txt
You are on page 1of 131

Introduction to

Languages and
Grammars
Alphabets and Languages

An alphabet is a finite non-empty set.


Let S and T be alphabets.
S • T = { s t | s ε S, t ε T }
(We’ll often write ST for S•T.)
λ = empty string, string of length one
S0 = {λ }
S1 = S
Sn = S(n-1) • S, n > 1
S+ = S1 U S2 U S3 U . . .
S* = S0 U S+
A language L over an alphabet S is a subset of S*.
How Many Languages Are There?

• How many languages over a particular alphabet are there?


Uncountably infinitely many!
• Then, any finite method of describing languages can not
include all of them.
• Formal language theory gives us techniques for defining
some languages over an alphabet.
Why should I care about ?

• Concepts of syntax and semantics used


widely in computer science:
• Basic compiler functions
• Development of computer languages
• Exploring the capabilities and limitations of
algorithmic problem solving
Methods for Defining Languages

• Grammar
– Rules for defining which strings over an
alphabet are in a particular language
• Automaton (plural is automata)
– A mathematical model of a computer which can
determine whether a particular string is in the
language
Definition of a Grammar

• A grammar G is a 4 tuple G = (N, Σ, P, S), where


– N is an alphabet of nonterminal symbols
– Σ is an alphabet of terminal symbols
– N and Σ are disjoint
– S is an element of N; S is the start symbol or initial
symbol of the grammar
– P is a set of productions of the form α -> β where
• α is in (N U Σ)* N (N U Σ)*
• β is in (N U Σ)*
Definition of a Language Generated by a
Grammar
We define => by
γ α δ => γ δ β if
α -> β is in P, and
γ and δ are in (N U Σ)*

=>+ is the transitive closure of =>

=>* is the reflexive transitive closure of =>

The language L generated by grammar G = (N, Σ, P, S), is defined by

L = L(G) = { x | S *=> x and x is in Σ* }


Classes of Grammars
(The Chomsky Hierarchy)
• Type 0, Phrase Structure (same as basic grammar definition)
• Type 1, Context Sensitive
– (1) α -> β where α is in (N U Σ)* N (N U Σ)*,
• β is in (N U Σ)+, and length(α) ≤ length(β)
– (2) γ A δ -> γ β δ where A is in N, β is in (N U Σ)+, and
• γ and δ are in (N U Σ)*
• Type 2, Context Free
– A -> β where A is in N, β is in (N U Σ)*
• Linear
– A-> x or A -> x B y, where A and B are in N and x and y are in Σ*
• Type 3, Regular Expressions
– (1) left linear A -> B a or A -> a, where A and B are in N and a is in Σ
– (2) right linear A -> a B or A -> a, where A and B are in N and a is in Σ
Comments on the
Chomsky Hierarchy (1)
• Definitions (1) and (2) for context sensitive are equivalent.
• Definitions (1) and (2) for regular expressions are equivalent.
• If a grammar has productions of all three of the forms described in definitions (1) and
(2) for regular expressions, then it is a linear grammar.
• Each definition of context sensitive is a restriction on the definition of phrase structure.
• Every context free grammar can be converted to a context sensitive grammar with
satisfies definition (2) which generates the same language except the language
generated by the context sensitive grammar cannot contain the empty string λ.
• The definition of linear grammar is a restriction on the definition of context free.
• The definitions of left linear and right linear are restrictions on the definition of linear.
Comments on the Chomsky Hierarchy

• Every language generated by a left linear grammar can be generated by a right linear
grammar, and every language generated by a right linear grammar can be generated by
a left linear grammar.
• Every language generated by a left linear or right linear grammar can be generated by a
linear grammar.
• Every language generated by a linear grammar can be generated by a context free
grammar.
• Let L be a language generated by a context free grammar. If L does not contain λ, then
L can be generated by a context sensitive grammar. If L contains λ, then L-{λ} can be
generated by a context sensitive grammar.
• Every language generated by a context sensitive grammar can be generated by a phrase
structure grammar.
Example : A Left Linear Grammar for Identifiers

• S -> Sa • S => a
• S -> Sb
• S -> S1 • S => S 1 => a 1
• S -> S2
• S -> a • S => S 2 => S b 2
• S -> b => S 1 b 2 => a 1 b 2
Example : A Right Linear Grammar for Identifiers

• S -> a T T -> 1 T • S => a


• S -> b T T -> 2 T
• S -> a T -> a • S => a T => a 1
• S -> b T -> b
• T -> a T T -> 1
• T -> b T T -> 2 • S => a T => a 1 T
=> a 1 b T => a 1 b 2
Example:
A Right Linear Grammar for {an bm cp | n, m, p > 0}

• S -> a A S => a A => a a A


• A -> a A => a a a A
• A -> b B
=> a a a b B
• B -> b B
• B -> c C => a a a b c C
• B -> c => a a a b c c
• C -> c C
• C -> c
Example : A Linear Grammar for { an bn | n >0 }

• S -> a S b S => a S b
• S -> a b => a a S b b
=> a a a S b b b
=> a a a a b b b b
Example : A Linear Grammar for {an bm cm dn | n > 0}
(Context Free Grammar)

• S -> a S b S => a S b
• S -> a T b => a a S b b
• T -> c T d => a a a T b b b
• T -> c d => a a a c T d b b b
=> a a a c c d d b b b
Another Example :
A Context Free Grammar for {an bn cm dm | n > 0}

• S -> R T S => R T
• R -> a R b => a R b T
• R -> a b => a a R b b T
• T -> c T d => a a a b b b T
• T -> c d => a a a b b b c T d
=> a a a b b b c c d d
A Context Free Grammar for Expressions

• S -> E • S => E => E + T


F -> (E) – => E - T + T
• E -> E + T – => T - T + T
F -> a
• E -> E - T – => F - T + T
F -> b – => a - T + T
• E -> T F -> c – => a - T * F + T
• T -> T * F F -> d – => a - F * F + T
F -> e – => a - b * F + T
• T -> T / F
– => a - b * c + T
• T -> F – => a - b * c + F
– => a - b * c - d
A Context Sensitive Grammar for {an bn cn | n >0}

• S -> a S B C S => a S B C
• S -> a B C => a a B C B C
• a B -> a b => a a b C B C
• b B -> b b => a a b B C C
• C B -> B C => a a b b C C
• b C -> b c => a a b b c C
• c C -> c c => a a b b c c
The Chomsky Hierarchy and the Block Diagram of a
Compiler

Type 3 Type 2
Int.
Source tokens tree Inter- Object
code
language Scanner mediate Code language
Parser Code
Optimizer
program Generator program
Generator

Symbol Error Error


Type 1 Table Handler messages
Manager

Symbol Table
Automata

• Turing machine (Tm)


• Linear bounded automaton (lba)
• 2-stack pushdown automaton (2pda)
• (1-stack) pushdown automaton (pda)
• 1 turn pushdown automaton
• finite state automaton (fsa)
Recursive Definition

Primitive regular expressions: ∅, λ, α


Given regular expressions r1 and r2

r1 + r2
r1 ⋅ r2
Are regular expressions
r1 *
(r1 )
Example (a + b ) ⋅ a *

L((a + b ) ⋅ a *) = L((a + b )) L(a *)


= L(a + b ) L(a *)
= ( L(a ) ∪ L(b )) ( L(a )) *
= ({a} ∪ {b}) ({a}) *
= {a, b}{λ , a, aa, aaa,...}
= {a, aa, aaa,..., b, ba, baa,...}
Example (a + b ) ⋅ a *

L((a + b ) ⋅ a *) = L((a + b )) L(a *)


= L(a + b ) L(a *)
= ( L(a ) ∪ L(b )) ( L(a )) *
= ({a} ∪ {b}) ({a}) *
= {a, b}{λ , a, aa, aaa,...}
= {a, aa, aaa,..., b, ba, baa,...}
Example

• Regular expression r = (aa ) * (bb ) * b

2n 2m
L(r ) = {a b b : n, m ≥ 0}
Example

Regular expression r = (0 + 1) * 00 (0 + 1) *

L(r ) = { all strings with at least


two consecutive 0 }
Example

• Regular expression r = (1 + 01) * (0 + λ )

L(r ) = { all strings without


two consecutive 0 }
Equivalent Regular Expressions

• Definition:

Regular expressions r1 and r2

are equivalent if L(r1 ) = L(r2 )


Example

L = { all strings without


two consecutive 0 }
r1 = (1 + 01) * (0 + λ )
r2 = (1* 011*) * (0 + λ ) + 1* (0 + λ )

r1 and r2
L(r1 ) = L(r2 ) = L
are equivalent
regular expr.
Linear Grammars

Grammars with
at most one variable at the right side
of a production

S → aSb S → Ab
Examples:
S →λ A → aAb
A→λ
A Non-Linear Grammar
Grammar G: S → SS
S →λ
S → aSb
S → bSa

L(G ) = {w : na ( w) = nb ( w)}

Number of a in string w
Another Linear Grammar

S→A
Grammar G: A → aB | λ
B → Ab

n n
L(G ) = {a b : n ≥ 0}
Right-Linear Grammars

• All productions have form: A → xB


or
A→ x

• Example:
S → abS string of
S →a terminals
Left-Linear Grammars

All productions have form:


A → Bx
or
A→ x

Example:
S → Aab string of
A → Aab | B terminals

B→a
Regular Grammars

A regular grammar is any


right-linear or left-linear grammar

Examples: G1 G2
S → abS S → Aab
S →a A → Aab | B
B→a
Observation

Regular grammars generate regular


languages
G2
Examples:
G1 S → Aab
S → abS A → Aab | B
S →a B→a

L(G1 ) = (ab) * a L(G2 ) = aab(ab) *


Closure under language operations

• Theorem. The set of regular languages is


closed under the all the following
operations. In other words if L1 and L2 are
regular, then so is:
– Union: L1 ∪ L2
– Intersection: L1 ∩ L2
– Complement: L1c = Σ* \ L1
– Difference: L1 \ L2
– Concatenation: L1L2
– Kleene star: L1*
Regular expressions versus regular
languages
• We defined a regular language to be one that is
accepted by some DFA (or NFA).
• We can prove that a language is regular by this
definition if and only if it corresponds to some
regular expression. Thus,
– 1) Given a DFA or NFA, there is some regular expression
to describe the language it accepts
– 2) Given a regular expression, we can construct a DFA or
NFA to accept the language it represents
Application: Lexical-analyzers and
lexical-analyzer generators
• Lexical analyzer:
– Input: Character string comprising a computer program in
some language
– Output: A string of symbols representing tokens –
elements of that language
• Ex in C++ or Java :
– Input: if (x == 3) y = 2;
– Output (sort of): if-token, expression-token, variable-
name-token, assignment-token, numeric-constant token,
statement-separator-token.
Lexical-analyzer generators

• Input: A list of tokens in a programming


language, described as regular expressions
• Output: A lexical analyzer for that language
• Technique: Builds an NFA recognizing the
language tokens, then converts to DFA.
Regular Expressions: Applications and
Limitations
• Regular expressions have many applications:
– Specification of syntax in programming languages
– Design of lexical analyzers for compilers
– Representation of patterns to match in search engines
– Provide an abstract way to talk about programming
problems (language correpsonds to inputs that produce
output yes in a yes/no programming problem)
• Limitations
– There are lots of reasonable languages that are not
regular – hence lots of programming problems that
can’t be solved using power of a DFA or NFA
CFL


Context-Free Languages

n n R
{a b } {ww }

Regular Languages
Context-Free Languages

Context-Free Pushdown
Grammars Automata

stack

automaton
Example
A context-free grammar G: S → aSb
S →λ

A derivation:

S ⇒ aSb ⇒ aaSbb ⇒ aabb


S → aSb
S →λ

n n
L(G ) = {a b : n ≥ 0}

Describes parentheses: (((( ))))


Example

A context-free grammar G: S → aSa


S → bSb
S →λ

A derivation:

S ⇒ aSa ⇒ abSba ⇒ abba


S → aSa
S → bSb
S →λ

R
L(G ) = {ww : w ∈{a, b}*}
Example

A context-free grammar G: S → aSb


S → SS
S →λ

A derivation:

S ⇒ SS ⇒ aSbS ⇒ abS ⇒ ab
A context-free grammar G: S → aSb
S → SS
S →λ

A derivation:

S ⇒ SS ⇒ aSbS ⇒ abS ⇒ abaSb ⇒ abab


Definition: Context-Free Languages

A language Lis context-free


if and only if there is a

context-free grammar G
with L = L(G )
Derivation Order

1. S → AB 2. A → aaA 4. B → Bb
3. A → λ 5. B → λ
Leftmost derivation:
1 2 3 4 5
S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab
Rightmost derivation:
1 4 5 2 3
S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab
S → aAB
A → bBb
B → A|λ
Leftmost derivation:
S ⇒ aAB ⇒ abBbB ⇒ abAbB ⇒ abbBbbB
⇒ abbbbB ⇒ abbbb
Rightmost derivation:
S ⇒ aAB ⇒ aA ⇒ abBb ⇒ abAb
⇒ abbBbb ⇒ abbbb
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb
S

A B

a a A B b
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb


S

A B

a a A B b

λ
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb ⇒ aab


Derivation Tree S

A B

a a A B b

λ λ
S → AB A → aaA | λ B → Bb | λ

S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb ⇒ aab


Derivation Tree S

A B
yield

a a A B b aaλλb
= aab
λ λ
Sometimes, derivation order doesn’t matter
Leftmost:
S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab
Rightmost:
S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab
S
Same derivation tree
A B

a a A B b

λ λ
E → E + E | E ∗ E | (E) | a
a + a∗a

E E ⇒ E + E ⇒ a+ E ⇒ a+ E∗E
⇒ a + a∗ E ⇒ a + a*a
E + E
leftmost derivation

a E ∗ E

a a
E → E + E | E ∗ E | (E) | a
a + a∗a

E ⇒ E∗E ⇒ E + E∗E ⇒ a+ E∗E E


⇒ a + a∗E ⇒ a + a∗a
E ∗ E
leftmost derivation

E + E a

a a
E → E + E | E ∗ E | (E) | a
a + a∗a
Two derivation trees
E E

E + E E ∗ E

a E ∗ E E + E a

a a a a
The grammarE → E + E | E ∗ E | (E) | a
is ambiguous:

string a + a ∗ a has two derivation trees

E E

E + E E ∗ E

a E ∗ E E + E a

a a a a
Definition:
A context-free grammar G is ambiguous

if some string w ∈ L(G ) has:

two or more derivation trees

Copyright © 2006 Addison-Wesley. All rights reserved. 1-61


In other words:

A context-free grammar G is ambiguous

if some string w ∈ L(G ) has:

two or more leftmost derivations


(or rightmost)
2 + 2∗2 = 6 2 + 2∗2 = 8
6 8
E E
2 4 4 2
E + E E ∗ E
2 2 2 2
2 E ∗ E E + E 2

2 2 2 2
• Ambiguity is bad for programming languages

• We want to remove ambiguity

Copyright © 2006 Addison-Wesley. All rights reserved. 1-64


We fix the ambiguous grammar:
E → E + E | E ∗ E | (E) | a

New non-ambiguous grammar: E → E +T


E →T
T →T ∗F
T →F
F → (E)
F →a
E ⇒ E +T ⇒T +T ⇒ F +T ⇒ a +T ⇒ a +T ∗F
⇒ a + F ∗F ⇒ a + a∗F ⇒ a + a∗a
E a + a∗a
E → E +T
E + T
E →T
T →T ∗F T T ∗ F
T →F
F F a
F → (E)
F →a a a
Copyright © 2006 Addison-Wesley. All rights reserved. 1-66
Unique derivation tree

E a + a∗a
E + T

T T ∗ F

F F a

a a
Copyright © 2006 Addison-Wesley. All rights reserved. 1-67
The grammar G: E → E +T
E →T
T →T ∗F
T →F
F → (E)
F →a
is non-ambiguous:
Every string w∈ L(G ) has
a unique derivation tree
Another Ambiguous Grammar

IF_STMT → if EXPR then STMT


| if EXPR then STMT else STMT

Copyright © 2006 Addison-Wesley. All rights reserved. 1-69


If expr1 then if expr2 then stmt1 else stmt2
IF_STMT

if expr1 then STMT

if expr2 then stmt1 else stmt2

IF_STMT

if expr1 then STMT else stmt2

if expr2 then stmt1


Inherent Ambiguity

• Some context free languages


• have only ambiguous grammars

n n m n m m
Example: L = {a b c } ∪ {a b c }

S → S1 | S 2 S1 → S1c | A S 2 → aS2 | B
A → aAb | λ B → bBc | λ
Copyright © 2006 Addison-Wesley. All rights reserved. 1-71
n n n
The string a b c
has two derivation trees

S S

S1 S2

S1 c a S2
Simplification of Context Free Grammar
A Substitution Rule

Equivalent
grammar
S → aB
S → aB | ab
A → aaA
Substitute A → aaA
A → abBc B→b A → abBc | abbc
B → aA
B → aA
B→b
Copyright © 2006 Addison-Wesley. All rights reserved. 1-74
A Substitution Rule
S → aB | ab
A → aaA
A → abBc | abbc
B → aA
Substitute
B → aA
S → aB | ab | aaA
Equivalent
A → aaA
A → abBc | abbc | abaAc
grammar
In general:
A → xBz

B → y1

Substitute
B → y1

equivalent
A → xBz | xy1z grammar
Nullable Variables

λ − production : A→λ

Nullable Variable: A ⇒K⇒ λ

Copyright © 2006 Addison-Wesley. All rights reserved. 1-77


Removing Nullable Variables

Example Grammar:

S → aMb
M → aMb
M →λ

Nullable variable

Copyright © 2006 Addison-Wesley. All rights reserved. 1-78


Final Grammar

S → aMb
S → aMb
Substitute S → ab
M → aMb M →λ
M → aMb
M →λ
M → ab

Copyright © 2006 Addison-Wesley. All rights reserved. 1-79


Unit-Productions

Unit Production: A→ B

(a single variable in both sides)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-80


Removing Unit Productions

Observation:

A→ A

Is removed immediately

Copyright © 2006 Addison-Wesley. All rights reserved. 1-81


Example Grammar:

S → aA
A→a
A→ B
B→A
B → bb

Copyright © 2006 Addison-Wesley. All rights reserved. 1-82


S → aA
S → aA | aB
A→a
Substitute A→a
A→ B A→ B B → A| B
B→A
B → bb
B → bb

Copyright © 2006 Addison-Wesley. All rights reserved. 1-83


S → aA | aB S → aA | aB
A→a Remove A→a
B → A| B B→B B→A
B → bb B → bb

Copyright © 2006 Addison-Wesley. All rights reserved. 1-84


S → aA | aB
S → aA | aB | aA
A→a Substitute
B→A A→a
B→A
B → bb
B → bb

Copyright © 2006 Addison-Wesley. All rights reserved. 1-85


Remove repeated productions

Final grammar
S → aA | aB | aA S → aA | aB
A→a A→a
B → bb B → bb

Copyright © 2006 Addison-Wesley. All rights reserved. 1-86


Useless Productions
S → aSb
S →λ
S→A
A → aA Useless Production

Some derivations never terminate...

S ⇒ A ⇒ aA ⇒ aaA ⇒ K ⇒ aa K aA ⇒ K
Copyright © 2006 Addison-Wesley. All rights reserved. 1-87
Another grammar:

S→A
A → aA
A→λ
B → bA Useless Production
Not reachable from S

Copyright © 2006 Addison-Wesley. All rights reserved. 1-88


In general: contains only
terminals
if S ⇒ K ⇒ xAy ⇒ K ⇒ w

w ∈ L(G )

then variable A is useful

otherwise, variable A is useless


A production A → x is useless
if any of its variables is useless

S → aSb
S →λ Productions
Variables S→A useless
useless A → aA useless
useless B→C useless

useless C→D useless


Removing Useless Productions
Example Grammar:

S → aS | A | C
A→a
B → aa
C → aCb

Copyright © 2006 Addison-Wesley. All rights reserved. 1-91


First: find all variables that can produce
strings with only terminals

S → aS | A | C Round 1: { A, B}
A→a S→A
B → aa
C → aCb Round 2: { A, B, S }
Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)

S → aS | A | C
A→a S → aS | A
B → aa A→a
C → aCb B → aa
Remove useless productions
Second: Find all variables
reachable from S

Use a Dependency Graph

S → aS | A
A→a S A B
B → aa not
reachable
Keep only the variables
reachable from S
(the rest variables are useless)

Final Grammar
S → aS | A
S → aS | A
A→a
A→a
B → aa

Remove useless productions


Removing All

• Step 1: Remove Nullable Variables

• Step 2: Remove Unit-Productions

• Step 3: Remove Useless Variables

Copyright © 2006 Addison-Wesley. All rights reserved. 1-96


Chomsky Normal Form for CFG
Each productions has form:

A → BC or A→a

variable variable terminal

Copyright © 2006 Addison-Wesley. All rights reserved. 1-97


Examples:

S → AS S → AS
S →a S → AAS
A → SA A → SA
A→b A → aa
Chomsky Not Chomsky
Normal Form Normal Form

Copyright © 2006 Addison-Wesley. All rights reserved. 1-98


Convertion to Chomsky Normal Form

S → ABa
• Example:
A → aab
B → Ac

Not Chomsky
Normal Form

Copyright © 2006 Addison-Wesley. All rights reserved. 1-99


Introduce variables for terminals: Ta , Tb , Tc

S → ABTa
S → ABa A → TaTaTb
A → aab B → ATc
B → Ac Ta → a
Tb → b
Tc → c
Introduce intermediate variable: V1

S → AV1
S → ABTa
V1 → BTa
A → TaTaTb
A → TaTaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Tc → c
Introduce intermediate variable: V2
S → AV1
S → AV1
V1 → BTa
V1 → BTa
A → TaV2
A → TaTaTb
V2 → TaTb
B → ATc
B → ATc
Ta → a
Ta → a
Tb → b
Tb → b
Tc → c
Tc → c
Final grammar in Chomsky Normal Form:
S → AV1
V1 → BTa
A → TaV2
Initial grammar
V2 → TaTb
S → ABa B → ATc
A → aab Ta → a
B → Ac Tb → b
Tc → c
In general:

From any context-free grammar


(which doesn’t produce λ )
not in Chomsky Normal Form

we can obtain:
An equivalent grammar
in Chomsky Normal Form

Copyright © 2006 Addison-Wesley. All rights reserved. 1-104


The Procedure

First remove:

Nullable variables

Unit productions

Copyright © 2006 Addison-Wesley. All rights reserved. 1-105


Then, for every symbol a:

Add production Ta → a

In productions: replace a with Ta

New variable: Ta
Replace any production A → C1C2 LCn

with A → C1V1
V1 → C2V2
K
Vn−2 → Cn−1Cn

New intermediate variables: V1, V2 , K,Vn−2


Observations

• Chomsky normal forms are good


for parsing and proving theorems

• It is very easy to find the Chomsky normal


form for any context-free grammar

Copyright © 2006 Addison-Wesley. All rights reserved. 1-108


Greinbach Normal Form

All productions have form:

A → a V1V2 LVk k ≥0

symbol variables

Copyright © 2006 Addison-Wesley. All rights reserved. 1-109


Examples:

S → cAB
S → abSb
A → aA | bB | b
S → aa
B→b

Greinbach Not Greinbach


Normal Form Normal Form

Copyright © 2006 Addison-Wesley. All rights reserved. 1-110


Observations

• Greinbach normal forms are very good


for parsing

• It is hard to find the Greinbach normal


form of any context-free grammar

Copyright © 2006 Addison-Wesley. All rights reserved. 1-111


Properties of CFL

Copyright © 2006 Addison-Wesley. All rights reserved. 1-112


Union

Context-free languages
are closed under: Union

L1 is context free
L1 ∪ L2
L2 is context free is context-free

Copyright © 2006 Addison-Wesley. All rights reserved. 1-113


Example
Language Grammar

n n
L1 = {a b } S1 → aS1b | λ

R
L2 = {ww } S 2 → aS2 a | bS 2b | λ

Union

n n R S → S1 | S 2
L = {a b } ∪ {ww }
In general:
For context-free languages L1, L2
with context-free grammars G1, G2
and start variables S1, S 2

The grammar of the union L1 ∪ L2


has new start variable S
and additional production S → S1 | S 2
Concatenation

Context-free languages
are closed under: Concatenation

L1 is context free
L1L2
L2 is context free is context-free

Copyright © 2006 Addison-Wesley. All rights reserved. 1-116


Example
Language Grammar

n n
L1 = {a b } S1 → aS1b | λ

R
L2 = {ww } S 2 → aS2 a | bS 2b | λ

Concatenation

n n R S → S1S 2
L = {a b }{ww }
In general:

For context-free languages L1, L2


with context-free grammars G1, G2
and start variables S1, S 2

The grammar of the concatenation L1L2


has new start variable S
and additional production S → S1S 2
Star Operation

Context-free languages
are closed under: Star-operation

L is context free *
L is context-free

Copyright © 2006 Addison-Wesley. All rights reserved. 1-119


Example

Language Grammar

n n
L = {a b } S → aSb | λ

Star Operation

n n
L = {a b } * S1 → SS1 | λ
Copyright © 2006 Addison-Wesley. All rights reserved. 1-120
In general:

For context-free language L


with context-free grammar G
and start variable
S

The grammar of the star operation L *


has new start variable S1
and additional production S1 → SS1 | λ
Intersection

Context-free languages
are not closed under: intersection

L1 is context free
L1 ∩ L2
L2 is context free not necessarily
context-free
Copyright © 2006 Addison-Wesley. All rights reserved. 1-122
Example
n n m n m m
L1 = {a b c } L2 = {a b c }
Context-free: Context-free:
S → AC S → AB
A → aAb | λ A → aA | λ
C → cC | λ B → bBc | λ
Intersection
n n n
L1 ∩ L2 = {a b c } NOT context-free
Complement

Context-free languages
are not closed under: complement

L is context free L not necessarily


context-free

Copyright © 2006 Addison-Wesley. All rights reserved. 1-124


Example
n n m n m m
L1 = {a b c } L2 = {a b c }
Context-free: Context-free:
S → AC S → AB
A → aAb | λ A → aA | λ
C → cC | λ B → bBc | λ
Complement
n n n
L1 ∪ L2 = L1 ∩ L2 = {a b c }
NOT context-free
The intersection of
a context-free language and
a regular language
is a context-free language

L1 context free
L1 ∩ L2
L2 regular context-free
Summary

Copyright © 2006 Addison-Wesley. All rights reserved. 1-127


The Chomsky Hierarchy

Non-recursively enumerable

Recursively-enumerable
Recursive

Context-sensitive

Context-free

Regular
Who is Noam Chomsky Anyway?

• Philosopher of Languages
• Professor of Linguistics at MIT
• Constructed the idea that language was not
a learned “behavior”, but that it was
cognitive and innate; versus stimulus-
response driven
• In an effort to explain these theories, he
developed the Chomsky Hierarchy
Chomsky Hierarchy

Language Grammar Machine Example


Regular Grammar Deterministic or
Regular • Right-linear Nondeterministic
grammar a*
Language Finite-state
• Left-linear acceptor
grammar

Context-free Context-free Nondeterministic anbn


Language grammar Pushdown
automaton

Context- Context-sensitive Linear-bounded anbncn


sensitive grammar automaton

Recursively Unrestricted Turing machine Any computable


enumerable grammar function
Chomsky Hierarchy

• Comprises four types of languages and


their associated grammars and machines.
• Type 3: Regular Languages
• Type 2: Context-Free Languages
• Type 1: Context-Sensitive Languages
• Type 0: Recursively Enumerable Languages
• These languages form a strict hierarchy

You might also like