Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Grammar

◼ A grammar G is a quadruple (V, Σ, R, S), where:


◼ V is the rule alphabet, which contains nonterminals
(symbols that are used in the grammar but that do not
appear in strings in the language) and terminals
(symbols that can appear in strings generated by G),
◼ Σ (the set of terminals) is a subset of V,
◼ R (the set of rules) is a finite set of rules of the form
α→β,
◼ S (the start symbol) is a nonterminal.
1
Chomsky Hierarchy

◼ Types of Grammar:
1. Type 0 grammar(Phrase Structured Grammar)
2. Type 1 grammar(Context Sensitive Grammar)
3. Type 2 grammar(Context Free Grammar)
4. Type 3 grammar(Regular Grammar)

2
Phrase Structured Grammar
◼ A Phrase Structured Grammar/ Unrestricted Grammar / Type 0 Grammar G
is a quadruple (V, Σ, R, S), where:
◼ V is the rule alphabet, which contains non terminals and terminals.
◼ Σ (the set of terminals) is a subset of V,
◼ R (the set of rules) is a finite set of rules of the form α→β,
◼ S (the start symbol) is a nonterminal.
◼ Here, all rules in R must:
◼ α→β,
◼ Where, α Ꞓ(VUT)+ and β Ꞓ(VUT)*
◼ Eg: S → aAb | ε
aA→bAA
bA→a
◼ Language: Recursively Enumerable Language
◼ Machine: Turing Machine
◼ Most Powerful grammar. 3
Context Sensitive Grammar
◼ A Context Sensitive Grammar / Type 1 G is a quadruple (V, Σ, R, S), where:
◼ V is the rule alphabet, which contains non terminals and terminals,
◼ Σ (the set of terminals) is a subset of V,
◼ R (the set of rules) is a finite set of rules of the form α→β,
◼ S (the start symbol) is a nonterminal.
◼ In a Context Sensitive Grammar, rules :
◼ There is restriction on the length of β. The length of β should be at least as
much as the length of α. |β| ≥ |α|
◼ α and β Ꞓ (VUT)+. i.e. ε cannot appear on LHS or RHS of any rule. It is an
ε-free grammar.
◼ Machine: Linear Bounded Automata
◼ Language: Context Sensitive Language
◼ Eg: S→aAb
aA→bAA
bA→aa 4
Context Free Grammar
◼ A Context Free Grammar/ Type 2 Grammar G is a quadruple (V, Σ, R, S),
where:
◼ V is the rule alphabet, which contains non terminals and terminals,
◼ Σ (the set of terminals) is a subset of V,
◼ R (the set of rules) is a finite set of rules of the form A→α,
◼ S (the start symbol) is a nonterminal.
◼ In a context free grammar, all rules in R must be of the form:
◼ A→ α, Where
◼ A→ Single Nonterminal & αꞒ(VUT)*
◼ Machine: Push Down Automata
◼ Language: Context Free Language
◼ Eg:- S→aB/bA/ε
A→aA/b
B→bB/a/ ε
5
Regular Grammar
◼ A regular grammar G is a quadruple (V, Σ, R, S), where:
◼ V is the rule alphabet, which contains nonterminals (symbols that are used in
the grammar but that do not appear in strings in the language) and terminals
(symbols that can appear in strings generated by G),
◼ Σ (the set of terminals) is a subset of V,
◼ R (the set of rules) is a finite set of rules of the form α→β,
◼ S (the start symbol) is a nonterminal.
◼ In a regular grammar, all rules in R must:
◼ have a left-hand side that is a single nonterminal, and
◼ have a right-hand side that is ε or a single terminal or a single terminal followed by
a single nonterminal.
◼ So S → a, S → ε, and T → aS are legal rules in a regular grammar.
◼ Machine: Finite Automata
◼ Language: Regular language
6
Finite State Machine to
Regular Grammar
◼ Procedure:
◼ V(Nonterminals): States of DFSM
◼ Σ(Terminals): Alphabets of DFSM
◼ S=q0 i.e., start state of DFSM is start
symbol of grammar
◼ Rules:
◼ If δ(qi,a)=qj then introduce the rule as: qi→aqj
◼ If q Ꞓ F i.e., if q is the final state in FSM, then
introduce the rule as: q→ε
7
Obtain a grammar to generate
string of any numbers of a’s
◼ Transition: Rules:
S
◼ S is a final State S→ε
a ◼ δ(S,a)=S S→aS
SO, the Grammar is:
S→aS / ε
OR S→aS
S→ ε
Language generated is: L={an:n ≥0}8
Obtain a grammar to generate
string of at least one a.
◼ Transition: Rules:
S a A
◼ A is a final State A→ ε
a ◼ δ(S,a)=A S→aA
◼ δ(A,a)=A A→aA
So, the Grammar is:
S→aA
A→aA/ε
Language generated is: L={an:n ≥1}9
Obtain a grammar to generate
string of any no. of a’s and b’s.
◼ Transition: Rules:
S
◼ S is a final State S→ε
a,b ◼ δ(S,a)=S S→aS
◼ δ(S,b)=S S→bS
SO, the Grammar is:
S→aS/ bS/ε
Language generated is: L={(a+b)n:n ≥0}
10
Obtain a grammar to generate
string of at least two a’s
◼ Transition: Rules:
◼ B is a final State B→ε
S a A
◼ δ(S,a)=A S→aA
◼ δ(A,a)=B
a A→aB
◼ δ(B,a)=B B→aB
B

a
So, the Grammar is:
S→aA
A→ aB
B→aB/ε 11
Obtain a grammar to generate
string of at multiple’s of three a’s
◼ S→aA / ε
S a A
◼ A->aB
a
◼ B→aS
a
B

12
◼ Transitions Rules
◼ δ(S,a)=A S→aA
◼ δ(A,a)=S A→aS
◼ S is final state S→ε
G=(V, Σ, R,S)
V={S,A,a}
Σ ={a}
S is start symbol
R={S→aA/ ε
A→aS}
13
◼ Obtain a grammar to accept the
language L={w:|w|mod 3>0, wꞒ{a}*}

a A
S
a
a
B

14
Show a regular grammar for each of the
following languages:
1. {w Ꞓ {a, b}* : w contains an even number of a’s and an odd number of b’s}.

2. {w Ꞓ {a, b}* : w does not end in aa}.

3. {w Ꞓ {a, b}* : w does not contain the substring aabb}.

15
grammartofsm(G: regular grammar) =
1. Create in M a separate state for each nonterminal in V.
2. Make the state corresponding to S the start state.
3. If there are any rules in R of the form X → w, for some w Ꞓ Σ, then create an
additional state labeled #.
4. For each rule of the form X → w Y, add a transition from X to Y labeled w.
5. For each rule of the form X → w, add a transition from X to # labeled w.
6. For each rule of the form X → ε, mark state X as accepting.
7. Mark state # as accepting.
8. If M is incomplete (i.e., there are some (state, input) pairs for which no
transition is defined), M requires a dead state. Add a new state D. For every (q,
i) pair for which no transition has already been defined, create a transition from
q to D labeled i. For every i in Σ, create a transition from D to D labeled i.

S→aB
16
◼ (a+b)=>a, b
◼ (a+b)*=>ε,a,b,ab,ba,aab,bba,aba,bab,
aaaa,bbbb
◼ (a.b)*=> ε,ab,abab,ababab
◼ Ba,bbba,aaaa,bbbb

17
Example 7.2 Strings that End with aaaa

◼ Let L = {w Ꞓ {a, b}* : w ends with the pattern aaaa}. Alternatively,


L = (a U b)* aaaa.
◼ The following regular grammar defines L:
S → aS δ(S,a)=S
/* An arbitrary number of a’s and b’s can be generated before
S → bS δ(S,b)=S /* the pattern starts.
S → aB δ(S,a)=B /* Generate the first a of the pattern.
B → aC δ(B,a)=C /* Generate the second a of the pattern.
C → aD δ(C,a)=D /* Generate the third a of the pattern.
D→a δ(D,a)=ε /* Generate the last a of the pattern and
quit.

18
Contd….
Applying grammartofsm to this grammar, we get:

Notice that the machine that grammartofsm builds is not necessarily


deterministic.

a/b

S a B a C a a
D #

δ(S,a)=S
δ(S,b)=S
δ(S,a)=B /* Generate the first a of the pattern.
δ(B,a)=C /* Generate the second a of the pattern.
δ(C,a)=D /* Generate the third a of the pattern.
δ(D,a)=ε 19
Example 7.3 The Missing
Letter Language
◼ Let Σ = {a, b, c}. LMissing = {w : there is a symbol ai Ꞓ Σ not appearing in w}
◼ The job of S is to generate some string in LMissing. It does that by choosing a
first character of the string and then choosing which other character will be
missing.
◼ The job of A is to generate all strings that do not contain any a’s.
◼ The job of B is to generate all strings that do not contain any b’s. And the job
of C is to generate all strings that do not contain any c’s.

S→ε
S → aB δ(S,a)=B B → aB
A → bA B → cB
S → aC δ(S,a)=C
A → cA B→ε
S → bA δ(S,b)=A
A→ε C → aC
S → bC δ(S,b)=C
S → cA δ(S,c)=A C → bC
S → cB δ(S,c)=B C→ε
20
A
b,c
b,c
a,c
S B a,c

a,b
C a,b

21
b
a
S T a #

a a
W

Regular Expression:
a(bUaa)*a
DFSM:→

22
L = {w ∈ {a, b}* : every a in w is immediately followed by at least one b}.

a) Write a regular expression that describes L. b


(ab ∪ b)*
S
a T
b) Write a regular grammar that generates L. b
S → bS
S → aT
T → bS
S→ε

c) Construct an FSM that accepts L.

23
b) {w ∈ {a, b}* : w does not end in aa}.

◼ Regular Grammar
S → aA | bB | ε
A → aC | bB | ε a
S A
B → aA | bB | ε
C → aC | bB ba b a
B
b C

b a

24
Properties of Regular
Languages
Reading: Chapter 4

25
Topics
1) How to prove whether a given
language is regular or not?
1) Pumping Lemma Theorem
2) Some examples to prove language is not
regular
2) Closure properties of regular
languages

26
Some languages are not
regular
When is a language is regular?
if we are able to construct one of the
following: DFSM or NFSM or  -NFSM or regular
expression

When is it not?
If we can show that no FSM can be built for a
language

27
How to prove languages are
not regular?
What if we cannot come up with any FSM?
A) Can it be language that is not regular?
B) Or is it that we tried wrong approaches?

How do we decisively prove that a language


is not regular?

“The hardest thing of all is to find a black cat in a dark room,


especially if there is no cat!” -Confucius
28
The Pumping Theorem for
Regular Languages
◼ Theorem: If L is a regular language, then:
◼ Ǝk ≥ 1 (∀ strings w Ꞓ L, where |w| ≥ k (Ǝ x, y, z (w = xyz, |xy| ≤ k,
y≠ε, and ∀ q ≥ 0 (xyqz Ꞓ L)))).
◼ Proof: The proof is the argument that we gave above: If L is regular
then it is accepted by some DFSM M = (K, Σ, δ, s, A). Let k be |K|.
Let w be any string in L of length k or greater. By Theorem 8.5, to
accept w, M must traverse some loop at least once. We can carve w
up and assign the name y to the first substring to drive M through a
loop. Then x is the part of w that precedes y and z is the part of w that
follows y.

29
The Pumping Theorem for
Regular Languages
◼ We show that each of the last three conditions must then hold:
1. |xy| ≤ k : M must not only traverse a loop eventually when reading w, it must
do so for the first time by at least the time it has read k characters. It can read
k-1 characters without revisiting any states. But the kth character must, if no
earlier character already has, take M to a state it has visited before. Whatever
character does that is the last in one pass through some loop.
2. y ≠ ε: since M is deterministic, there are no loops that can be traversed by .
3. ∀ q ≥ 0 (xyqz Ꞓ L): y can be pumped out once (which is what happens if q = 0)
or in any number of times (which happens if q is greater than 1) and the
resulting string must be in L since it will be accepted by M.
◼ It is possible that we could chop y out more than once and still generate a
string in L, but without knowing how much longer w is than k, we don’t know
any more than that it can be chopped out once.

30
The Pumping Theorem for
Regular Languages

◼ The Pumping Theorem tells us something that is true of every regular


language.
◼ Generally, if we already know that a language is regular, we won’t
particularly care about what the Pumping Theorem tells us about it. But
suppose that we are interested in some language L and we want to know
whether or not it is regular.
◼ If we could show that the claims made in the Pumping Theorem are not true
of L, then we would know that L is not regular.
◼ In particular, we will use it to construct proofs by contradiction.
◼ We will say, “If L were regular, then it would possess certain properties.
But it does not possess those properties. Therefore, it is not regular.”

31
Example 8.8 n
A B n is not
Regular
◼ Let L be AnBn = {anbn : n ≥ 0}. We can use the Pumping Theorem to show that L is
not regular.
◼ If it were, then there would exist some k such that any string w, where |w| ≥ k, must
satisfy the conditions of the theorem. We show one string w that does not.
◼ Let w = akbk. Since |w| = 2k, w is long enough and it is in L, so it must satisfy the
conditions of the Pumping Theorem. So there must exist x, y, and z, such that w =
xyz, |xy| ≤ k, y≠ε, and ∀ q ≥ 0 (xyqz Ꞓ L).
◼ But we show that no such x, y, and z exist. Since we must guarantee that |xy| ≤ k, y
must occur within the first k characters and so y = ap for some p.
◼ Since we must guarantee that y ≠ε, p must be greater than 0. Let q = 2. (In other
words, we pump in one extra copy of y.) The resulting string is ak+pbk.
◼ The last condition of the Pumping Theorem states that this string must be in L, but
it is not since it has more a’s than b’s.
◼ Thus there exists at least one long string in L that fails to satisfy the conditions of
the Pumping Theorem.
◼ So L = AnBn is not regular. 32
n n
L={a b }
◼ w=a……….ab……….b=====➔2n
n n
◼ w= x y z
◼ |xy|<=k<=|w|;;; k=n
i.e. xy=an & z=bn
◼ Assume y=ap
then x=an-p
w=an-papbn = an-p+pbn= anbn
◼ So, let n=4,➔a4b4 p=1,then we get, w=a3a1b4
◼ According to pumping lemma, if w belong to regular language then xyqz Ꞓ L
for q≥0.
◼ Now let q=2,
◼ w=a3a1+2b4=a6b4
w=an+qbn ∉ L
33
The Even Palindrome
Language is Not Regular
◼ Let L be PalEven = {wwR : w Ꞓ {a, b}*}. PalEven is the language of even-length
palindromes of a’s and b’s.
◼ We can use the Pumping Theorem to show that PalEven is not regular.
◼ If it were, then there would exist some k such that any string w, where |w| ≥ k, must
satisfy the conditions of the theorem. We show one string w that does not.
◼ We will choose w so that we only have to consider one case for where y could fall.
◼ Let w = akbkbkak.
◼ w=akbk & wR=bkak
◼ Since |w| = 4k and w is in L, w must satisfy the conditions of the Pumping Theorem.
So there must exist x, y, and z, such that w = xyz, |xy| ≤ k, y≠ε, and ∀ q ≥ 0 (xyqz Ꞓ L).
Since |xy| ≤ k, y must occur within the first k characters and so y = ap for some p.
◼ Since y ≠ε, p must be greater than 0. Let q = 2. The resulting string is ak+pbkbkak. If p
is odd, then this string is not in PalEven because all strings in PalEven have even
length. If p is even then it is at least 2, so the first half of the string has more a’s than
the second half does, so it is not in PalEven.
34
◼ So L = PalEven is not regular.
◼ w=a……….ab……….bb……….ba……….a
k k k k
◼ w= x y z & |w|=4k
i.e. xy=ak & z=bkbkak
◼ Assume y=ap
then x=ak-p
w=ak-papbk bkak= akbkbkak
◼ So, let k=4, p=2,then we get,
w=a2a2b4b4a4
◼ According to pumping lemma, if w belong to regular language then
xyqz Ꞓ L for q≥0.
◼ Now let q=3,
◼ w=a2a2+3b4b4a4=a7b4b4a4
w=ak-p+p+qbkbkak =ak+qbkbkak ∉ L 35
Example 8.12 The Language with
More a’s Than b’s is Not Regular
◼ Let L = {anbm : n > m}. We can use the Pumping Theorem to show that L is not regular. If
it were, then there would exist some k such that any string w, where |w| ≥ k, must satisfy
the conditions of the theorem. We show one string w that does not.
1. Let w = ak+1bk. Since |w| = 2k+1 and w is in L, w must satisfy the conditions of the
Pumping Theorem.
2. So there must exist x, y, and z, such that w = xyz, |xy| ≤ k, y≠ε, and ∀ q ≥ 0 (xyqz Ꞓ L). Since
|xy| ≤ k, y must occur within the first k characters and so y = ap for some p.
3. Since y ≠ε, p must be greater than 0. There are already more a’s than b’s, as required by
the definition of L.
4. If we pump in, there will be even more a’s and the resulting string will still be in L. But
we can set q to 0 (and so pump out).
5. The resulting string is then ak+1-pbk. Since p > 0, k+1-p ≤ k, so the resulting string no
longer has more a’s than b’s and so is not in L.
6. There exists at least one long string in L that fails to satisfy the conditions of the Pumping
Theorem.
7. So L is not regular. 36
The Pumping Lemma for
Regular Languages
What it is?
The Pumping Lemma is a property
of all regular languages.
How is it used?
A technique that is used to show
that a given language is not regular
37
Pumping Lemma for Regular
Languages
Let L be a regular language

Then there exists some constant N such that for


every string w  L s.t. |w|≥N, there exists a
way to break w into three parts, w=xyz,
such that:
1. y≠ 
2. |xy|≤N
3. For all k≥0, all strings of the form xykz  L
This property should hold for all regular languages.

Definition: N is called the “Pumping Lemma Constant” 38


The Purpose of the Pumping
Lemma for RL
◼ To prove that some languages cannot
be regular.

39
Closure properties of Regular
Languages

40
Closure properties for Regular
Languages (RL) This is different
from Kleene
closure
◼ Closure property:
◼ If a set of regular languages are combined using an
operator, then the resulting language is also regular
◼ Regular languages are closed under:
◼ Union, intersection, complement
◼ Difference
◼ Reversal
◼ Kleene closure Now, lets prove all of this!
◼ Concatenation
◼ Homomorphism
◼ Inverse homomorphism
41
RLs are closed under union
◼ IF L and M are two RLs THEN:

➢ they both have two corresponding regular


expressions, R and S respectively

➢ (L U M) can be represented using the regular


expression R+S

➢ Therefore, (L U M) is also regular


How can this be proved using FSMs?
42
RLs are closed under
complementation
◼ If L is an RL over ∑, then L=∑*-L
➢ To show L is also regular, make the following
construction Convert every final state into non-final, and
every non-final state into a final state

DFSM for L DFSM for L


qF1 qF1

q0 qi qF2 q0 qi qF2


qFk qFk

Assumes q0 is a non-final state. If not, do the opposite.


43
RLs are closed under
intersection
◼ A quick, indirect way to prove:
◼ By DeMorgan’s law:
◼ L ∩ M = (L U M)
◼ Since we know RLs are closed under union
and complementation, they are also closed
under intersection
◼ A more direct way would be construct a
finite automaton for L ∩ M
44
DFSM construction for L ∩ M
◼ AL = DFSM for L = {QL, ∑ , qL,FL, δL }
◼ AM = DFSM for M = {QM, ∑ , qM,FM, δM }
◼ Build AL ∩ M = {QLx QM,∑, (qL,qM), FLx FM,δ}
such that:
◼ δ((p,q),a) = (δL(p,a), δM(q,a)), where p in QL, and q
in QM
◼ This construction ensures that a string w will
be accepted if and only if w reaches an
accepting state in both input DFSMs.

45
DFSM construction for L ∩ M
DFSM for L DFSM for M
qF1 pF1

a a
q0 qi qj qF2 p0 pi pj pF2


DFSM for LM
(qF1 ,pF1)


(q0 ,p0) (qi ,pi) (qj ,pj)

46
RLs are closed under set
difference
Closed under intersection
◼ We observe: Closed under
◼ L-M=L∩M complementation

◼ Therefore, L - M is also regular

47
RLs are closed under reversal
Reversal of a string w is denoted by wR
◼ E.g., w=00111, wR=11100
Reversal of a language:
◼ LR = The language generated by
reversing all strings in L

Theorem: If L is regular then LR is also


regular
48
 -NFSM Construction for LR

New -NFSM for LR

DFSM for L
qF1

q0 qi
a
qj qF2  q’0 New start
state


Make the
old start state
as the only new qFk
final state

What to do if q0 was Reverse all transitions


one of the final states
Convert the old set of final states
in the input DFSM? 49
into non-final states
If L is regular, LR is regular (proof
using regular expressions)
◼ Let E be a regular expression for L
◼ Given E, how to build ER?
◼ Basis: If E= , Ø, or a, then ER=E
◼ Induction: Every part of E (refer to the part as “F”)
can be in only one of the three following forms:
1. F = F1+F2
◼ FR = F1R+F2R
2. F = F 1F2
◼ FR = F2RF1R
3. F = (F1)*
◼ (FR)* = (F1R)*

50
Homomorphisms
◼ Substitute each symbol in ∑ (main alphabet)
by a corresponding string in T (another
alphabet)
◼ h: ∑--->T*
◼ Example:
◼ Let ∑={0,1} and T={a,b}
◼ Let a homomorphic function h on ∑ be:
◼ h(0)=ab, h(1)=
◼ If w=10110, then h(w) = abab = abab
◼ In general,
◼ h(w) = h(a1) h(a2)… h(an)
51
Given a DFSM for L, how to convert it into an FSM for h(L)?

FSM Construction for h(L)


Replace every edge
“a” by
DFSM for L
qF1 a path labeled h(a)
in the new DFSM
a
q0 qi qj qF2
h(a)


qFk

- Build a new FSM that simulates h(a) for every symbol a transition in
the above DFSM
- The resulting FSM may or may not be a DFSM, but will be a FSM for h(
53
Given a DFSM for M, how to convert it into an FSM for h-1(M)? The set of strings in ∑*
whose homomorphic translation
results in the strings of M

Inverse homomorphism
◼ Let h: ∑--->T*
◼ Let M be a language over alphabet T

◼ h-1(M) = {w | w  ∑* s.t., h(w)  M }

Claim: If M is regular, then so is h-1(M)


◼ Proof:
◼ Let A be a DFSM for M
◼ Construct another DFSM A’ which encodes h-1(M)
◼ A’ is an exact replica of A, except that its transition
functions are s.t. for any input symbol a in ∑, A’
will simulate h(a) in A.
◼ δ(p,a) = δ(p,h(a))
54

You might also like