Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

FORMAL LANGUAGE AND AUTOMATA THEORY

Debre Markos University Burie Campus


Formal Language and Automata Theory

CHAPTER ONE: Introduction

Formal language: is an abstraction of the general characteristics of


programming language. It consists a set of symbols and rules of
formation by which these symbols can be combined into entities
called sentences.

Automata: is a construct that possesses all the indispensable


features of a digital computer. Automata accepts inputs, produce
outputs, may have temporary storage and can make decision in the
input in to the output.

1.1 BASICS
1.1.1 Sets
A “set” is a collection of objects. For example, the collection of
four letters a, b, c and d is a set, which is written as
L = { a, b, c, d }
The objects comprising a set are called its “elements” or
“members”.
A set having only one element is called a “singleton”. A set
with no element at all is called the “empty set”, which is denoted
by 
It is essential to have a criterion for determining, for any given
thing, whether it is or is not a member of the given set. This
criterion is called the “Membership criterion” of the set.
There are two common ways to indicate the members of a set:
(i) List all the elements, e.g, {a, e, i, o, u}.
(ii) Provide some kind of an algorithm or a rule, such as a
grammar.

Let us now take a look at the notation that is being used to denote
sets.
(a) To indicate that x is a member of the set S, we write x 
S.
(b) If every element of set A is also an element of set B, we
say that A is a “subset” of B, and write A  B.
(c) If every element of set A is also an element of set B, but

Compiled by Yeshanew A.
pg. 1
FORMAL LANGUAGE AND AUTOMATA THEORY

B also has some elements not contained in A, we say that


A is a “proper
subset” of B and write A  B.
(d) We denote the “empty set” as { } or .

The set operations are as described below.

(a) Union
The “union” of two sets is the set that has objects that are elements
of at least one of the two given sets, and possibly both
That is, the union of sets A and B, written A  B, is a set that
contains everything in A, or in B, or in both.
A  B  {x :x  A or x  B}
Example: A = {1, 3, 9} B = {3, 5}
Therefore,A  B  {1, 3, 5, 9}

(b) Intersection
The “intersection” of sets A and B, written A  B, is a set that
contains exactly those elements that are in both A and B.
A  B  {x : x  A and x  B}
Example: Given A = {1, 3, 9}, B = {3, 5}, C = {a, b, c}
A  B = {3}
AC={}

(c) Set Difference


The “set difference” of set A and set B, written as A–B, is the set
that contains everything that is in A but not in B.
A  B  {x : x  A and x  B}
Given A  {1, 3, 9}, B  {3, 5}
A  B  {1, 9}

(d) Complement
The “complement” of set A, written as A is the set containing
everything that is not in A.

Properties of set operations


Some of the properties of the set operations follow from their
definitions. The following laws hold for the three given sets A, B
and C.

Idempotency : AAA
AAA
Commutativity : ABBA
Compiled by Yeshanew A.
pg. 2
FORMAL LANGUAGE AND AUTOMATA THEORY

ABBA

Associativity : ( A  B )  C  A  (B  C )
( A  B )  C  A  (B  C )

Distributivity :( A  B )  C  ( A  C )  ( B  C )
(AB)C(AC)(BC)
Absorption : (AB)AA
(AB)AA
DeMorgan’s Laws: A  (B  C )  ( A  B )  ( A  C )
A  (B  C )  ( A  B )  ( A  C )

Ì Example 1.1.1: Show that A  (B  C )  ( A  B )  ( A  C ).

S Olution
x  A  (B  C )  x  A and xBC
 x  A and x  B and x C
 (x  A and x  B ) and (x  A and x C )
Þ x  A  B and x  A  C
Þ x(AB)(AC)

Therefore A  (B  C )  ( A  B )  ( A  C ) (1)
Conversely,
x  ( A  B )  ( A  C )  x  A  B and x  A  C
 (x  A and x  B ) and (x  A and x C )
 x  A and (x  B and x C )
Þ x  A and x  B  C
Þ x  A  (B  C )
Therefore, ( A  B )  ( A  C )  A  (B  C ).
Hence A  (B  C )  ( A  B )  ( A  C ).

Ì Example 1.1.2: Given sets A and B are the subsets of a universal set U,
prove that
(a) A  B  A  B 
(b) A  B  A, if and only if A  B 
(c) A  B , if and only if A  B.

Solution
(a) Let x  A  B. Then
x  A  B  x  A and x  B
Þ x  A and x  B
Þ x  A  B

A  B  A  B (1)

Compiled by Yeshanew A.
pg. 3
Conversely, Let x  A  B. Then
x  A  B x A and x  B
 x  A and x  B (2)
xAB
Hence from (1) and (2)
A  B  A  B
(b) We have A  B . Then
A(AB)(AB)
Þ A  A  B  since A  B 
Þ A  A  B.
Again we have A  B  A. Then
A(AB)(AB)
Þ A  B  A  A Since A  B  A
Þ A  B .
(c) We have A  B. Then
ABA
ABA(AB)
Þ A  B  A  A Since A  B  A
Þ A  B .
If A  B , then
ABA(AB)
Þ A  B  A 
Þ ABA
Þ A  B.

Ì Example 1.1.3: Given three sets A, B and C, prove that


A  (B  C )  ( A  B )  C.

Solution
(i) Let us show that
A  (B  C )  ( A  B )  C
x  A  (B  C )
 x  A or x  (B  C ), by definition of union
 x  A or (x  B or x C )
 (x  A or x  B ) or x C

Compiled by Yeshanew A.
pg. 4
 x  ( A  B ) or x C
 x  ( A  B )  C.
Therefore we have
A  (B  C )  ( A  B )  C (1)
(ii) Let us now show that
( A  B )  C  A  (B  C ).
Assume that y is any element of the set ( A  B )  C
y(AB)C
 y  ( A  B ) or y C
 ( y  A or y  B ) or y C
 y  A or ( y  B or y C )
 y  A  (B  C )
Therefore we have
( A  B )  C  A  (B  C ) (2)
From (1) and (2), we have
A  (B  C )  ( A  B )  C

Ì Example 1.1.4: Prove that the intersection of sets is associative i.e., if


A, B and C are three sets, then
A  (B  C )  ( A  B )  C.

S olution
Let us prove that A  (B  C )  ( A  B )  C
Let x be an element such that
x  A  (B  C )
 x  A and x  (B  C )
 x  A and (x  B and x C )
Þ (x  A and x  B ) and x C
Þ x  ( A  B ) and x C
Þ x(AB)C

Therefore A  (B  C )  ( A  B )  C (1)
Let us prove that ( A  B )  C  A  (B  C ).
Let us assume the element y  ( A  B )  C
 y  ( A  B ) and y C

Compiled by Yeshanew A.
pg. 5
 ( y  A and y  B ) and y C
 y  A and y  B and y C
Þ y  A and y  ( B  C )
Þ yA(BC)

Therefore we have( A  B )  C  A  (B  C ) (2)

From (1) and (2), we have


A  (B  C )  ( A  B )  C

Ì Example 1.1.5: For any two sets A and B, prove the DeMorgan’s Laws
(a) ( A  B ) A B
(b) ( A  B ) A B

Solution
(a) x  ( A  B ) x  A  B
Û x  A and x  B
Û x  A and x  B
Û x  A B
(b) y  ( A  B ) y  A  B
 either y  A or y  B
 either y  A or y  B
 y  A B
Hence we have ( A  B ) A B.

Additional Terminology
(a)Disjoint Sets. If A and B have no common element, that is, A 
B , then the sets A and B are said to be disjoint.
(b)Cardinality. The “Cardinality” of a set A, written |A|, is the
number of elements in set A.
C. Power set. The “powerset” of a set A, written 2A, is the set of all
subsets of A; i.e., a set containing ‘n’ elements has a powerset
containing 2n elements.
(c) Cartesian Product. Let A and B be two sets. Then the set of all
ordered pairs (x, y) where x  A and y  B is called the “Cartesian
Product” of the sets A and B and is denoted by A  B, i.e.
A  B  (x, y) : x  A and y B
 
Example 1.1.6:Given= {1, 2, 3} determine (A)
(power set ofA).
Solution
As the set A = {1, 2, 3} has 3 elements the powerset P(A) will have

Compiled by Yeshanew A.
pg. 6
23 = 8 elements. P ( A)  , {1}, {2}, {3}, {1, 2}, {2, 3}, {3,1},
{1, 2, 3}

Ì Example 1.1.7: Given A= {1,2}, B = {x, y, z} and C = {3,


4}, find
A  B  C and n ( A  B  C ).

Solution
3 (1, x, 3)
x
4 (1, x, 4)
1 3 (1, y, 3)
y
4 (1, y, 4)
3 (1, z, 3)
z
4 (1, z, 4)
3 (2, x, 3)
x
4 (2, x, 4)
3 (2, y, 3)
y
2 4 (2, y, 4)
3 (2, z, 3)
z
4 (2, z, 4)

n( A  B  C )  n( A)  n(B )  n(C )  (2) (3) (2)  12..

1.1.2 Relations and Functions


Definition of Relation: A relation on sets S and T is a set of ordered
pairs (s, t), where
(a) s  S (s is a member of S )
(b) t T
(c) S and T need not be different
(d) The set of all first elements in the “domain” of the
relation, and
(e) The set of all second elements is the “range” of the
relation.

Example:
A w
B x
c
y
D
e z

Set S Set T

Fig. 1 Sets S and T are disjoint


Suppose S is the set {a, b, c, d, e} and set T is {w, x, y, z}.

Compiled by Yeshanew A.
pg. 7
Then a relation on S and T is
R  (a, y), (c, w), (c, z), (d, y)
The four ordered pairs in the relation is represented as shown in
Fig. 2.

A w
B x
c
y
D
e z

Fig. 2 Relation R = {(a, y), (c, w), (c, w), (c, z), (d,
y)}

Equivalence Relation
A subset R of A  A is called an equivalence relation on A if R
satisfies the following conditions:
(i) (a, a)  R for all a  A (R is reflexive)
(ii) If (a, b)  R, then (b, a)  R, then (a, b)  R (R is
symmetric)
(iii) If (a, b)  R and (b, c)  R, then (a, c)  R (R is
transitive)

Functions
Suppose every element of S occurs exactly once as the first element
of an ordered pair. In Fig shown, every element of S has exactly
one arrow arising from it. This kind of relation is called a
“function”.

a w
b x
c
y
d
e z

Fig 3. A Function
A function is otherwise known as “Mapping”. A function is said to
map an element in its domain to an element in its range.
Every element in S in the domain, i.e., every element of S is
mapped to some elements in the range. No element in the domain
maps to more than one element in the range.

Functions as relations
A function f : A  B is a relation from A to B i.e., a subset of A 

Compiled by Yeshanew A.
pg. 8
B, such that each a  A belongs to a unique ordered pair (a, b) in f.

1.1.3 Graphs and Trees


Graphs
A graph G consists of a finite set V of objects called “Vertices”, a
finite set E of objects called “Edges”, and a function  that assigns
to each edge a subset {v, w}, where v and w are vertices (and may
be the same).
Therefore we write G  (V , E,  ).
Example: Given V = {1, 2, 3, 4} and E = {e1, e2, e3, e4}
 is defined by
g (e1 ) {1, 2}
g (e2 )  {4, 3}
g (e3 )  {1, 3}
g (e4 )  {2, 4}
Then G  (V , E,  ) is a graph shown below.

1 2

4 3

Loop: A graph may have an edge from a vertex to itself, such an


edge is called a “loop”.

(A “walk” is a sequence of edges, where the finish vertex of each


edge is the start vertex of the next edge).
The length of a walk is the total number of edge traversed in going
form the initial vertex to the final one. A walk in with no edge is
repeated is said to be a path.
A path is simple if no vertex is repeated. A walk from vi to itself
with no repeated edge is called cycle with base vi. If no vertices
other than the base are repeated in a cycle, then it is said to be
simple.
An edge from a vertex to itself is called a loop.
Tree: A graph is said to be a “Tree” if it is connected and has no
simple cycles. (A “path” is a cycle if it starts and ends in the
same node.
A “simple cycle” : is one that does not repeat any nodes except for
the first and last).

Compiled by Yeshanew A.
pg. 9
Fig. 4 A Tree
Directed Graph: The graph is said to be a “directed graph” if it has
arrows in stead of lines.
Outdegree: The number of arrows pointing from a particular node
is the “outdegree” of that node.
Indegree: The number of arrows pointing to a particular node is the
“indegree”.

A B

F C

E D

Fig 5. Directed Graph

Compiled by Yeshanew A.
pg. 10
Directed graphs (as shown in fig.) are an easy way of depicting
binary relation

1.1.4 Strings and Languages


The mathematical study of the “Theory of Computation” begins by
understanding the Mathematics of strings of symbols.
Alphabet: It is defined as a finite set of symbols.
Example: Roman alphabet {a, b, ...... z}.
“Binary Alphabet” {0, 1} is pertinent to the theory of computation.

String: A “string” over an alphabet is a finite sequence of symbols


from that alphabet, which is usually written next to one another and
not separated by commas.
(i) If  a  {0,1} then 001001 is a string over  a .
(ii) If  b  {a, …, z) then axyrpqstcd is a string over  b .

Length of String: The “length” of a string is its length as a


sequence. The length of a string w is written as |w|.
Example: |10011| = 5

Empty String: The string of zero length is called the “empty


string”. This is denoted by .
The empty string plays the role of 0 in a number system.

Reverse String: If w  w1 w2 … wn where each wi , the reverse of


w is wn wn1 … w1 .

Substring: z is a substring of w if z appears consecutively


within w. As an example, ‘put’ is a substring of
‘Computer’.

Concatenation: Assume a string x of length m and string y of


length n, the concatenation of x and y is written xy, which is the
string obtained by appending y to the end of x, as in x1 x2 …xk xm y1
y2 …yk yn .
Suffix: If w = xv for some x, then v is a suffix of w.

Prefix: If w = vy for some y, then v is a prefix of w.

Lexicographic ordering: The Lexicographic ordering of strings is


the same as the dictionary ordering, except that shorter strings
precede longer string
The lexicographic ordering of all strings over the alphabet {0,
1} is (, 0, 1, 00, 01, 10, 11, 000, … ).

Language: Any set of strings over an alphabet  is called a


language.
Compiled by Yeshanew A.
pg. 11
The set of all strings, including the empty string over an alphabet 
is denoted as *.
Infinite languages L are denoted as L  w * :w has property P
Examples:
(a) L1  w {01}* : w has an equal number of 0' s and 1'
s
(b) L2  w * : w  wR  where wR is the reverse string of
w.

Concatenation of Languages: If L1 and L2 are languages over ,


their concatenation is L = L1  L2, or simply L = L1L2, where
L  w * :w  x  y for some x  L1 , and y  L2
Example: Given  {0,1}
L1  w * : w has an even number of 0' s
L2  w:w starts with a 0 and the rest of the symbols are 1'
s
Then L1 L2  w:w has an odd number of 0' s

Kleene Star: Another language operation is the “Kleene Star” of a


language L, which is denoted by L*.
L* is the set of all strings obtained by concatenating zero or more
strings from L.
L* = W  * :w=w1.w2.……Wk for some k≥0 and some w1,w2,..  L

Example: If L = {01, 1, 100} then 110001110011  L* , since


110001110011 = 1 100 01 1 100  1 1, each of these strings
is in L.
5. Example 1.1.7: Given  {a, b} obtain *.
(b) Give an example of a finite language in .
(c) Given L  a n b n : n  0, check if the strings aabb,
aaaabbbb, abb are in the language L.
Solution
 {a , b}
Therefore we have *  {, a , b, aa , ab, ba, bb, aaa, }
(a) {a, aa, aab} is an example of a finite language in .
(i) aa bb  a string in L. (n = 2)
(ii) aaaa bbbb  a string in L. (n = 4)
Ì Example 1.1.8: Given L  a n b n : n
2 R
 0 obtain (a) L (b) L .

Solution: Given L  a n b n :n  0
(a) L2  a n b n a m b m :n  0, m  0

Compiled by Yeshanew A.
pg. 12
where n and m are unrelated.
For example, the string aabbaaabbb is in L2.
(b) Reverse of L is given by
LR  b n a n :n  0
(iii) abb  not a string in L (since there is no n satisfying
this).

Ì Example 1.1.9: Let L = {ab, aa, baa}. Which of the


following strings are in L*.
(a) abaabaaabaa
(b) aaaabaaaa
(c) baaaaabaaaab
(d) baaaaabaa
Solution
Please note that L* is the “star-closure” of a language L, given by
L*  L0  L1  L2 …
and L+ is the “positive closure” defined by
L  L1  L2 …
(a) ab aa baa ab aa  This string is in L*
(b) aa aa baa aa  This string is in L*
(c) baa aa ab aa aa b or baa aa ab aa a ab

This string is not in L*.
*
(d) baa aa ab aa  This string is in L .
n n1
Ì Example 1.1.10: Given L  a b : n  0. It
is true that L = L* for the given language L?

Solution
We know L*  L0  L1  L2 …
and L  L1  L2  L3 …
Now for given L  a n b n1 :n  0, we have
L0  a 0 b 0 1  b
L1  a 1 b1 1  ab2
L2  a 2 b 2 1  a 2 b3 .

Therefore, we have L*  L0  L1  L2 …
 b  ab 2  a 2 b3 …
Hence it is true that L = L*.

1.1.7 Introduction to Grammar

Grammar is a mechanism to describe the languages.


A grammar (G) is defined as a quadruple
G = (V, T, S, P)
where
V = Finite set of objects called VARIABLES
T = Finite set of objects called TERMINAL SYMBOLS

Compiled by Yeshanew A.
pg. 13
S V = Start variables
P = Finite set of Productions.
A production rule P is of the form
xy
Given a string w, of the form w = uxv, we can use the production
rule x  y and obtain a new string z = uyv

The set of all strings obtained by using Production rules is the


“Language” generated by the Grammar.
If the grammar G = (V, T, S, P) then
* *
L(G)  {w T : S  w}
If W  L(G), then the sequence
S  w1  w 2  w3 …  wn  w
is a “derivation” of the sentence w.
The string S, w1 , w2 , …wn , which contain variables as well as
terminals, are called “SENTENTIAL FORMS” of the derivation.

Ì Example 1.1.11: Given a Grammar G  {S }, {a ,


b}, S, P  with P defined as
S  aSb,
S 
(i) Obtain a sentence in language generated by G and the
sentential form
(ii)Obtain the language L(G).
Solution
S  aSb
Þ aaSbb
Þ aabb
Therefore we have S * aabb .

(i) Sentence in the language generated by G =
aabb. Sentential form = aaSbb.
(ii) The rule S  aSb is recursive.
All sentential forms will have the forms
wi  a i S bi
Applying the production rule S  aSb, we get
a i Sb i  a i 1 S bi1
This is true for all i.
In order to get a sentence we apply S .

Compiled by Yeshanew A.
pg. 14
Therefore  n n
L(G)  a b ; n  0 .
Example 1.1.12:Obtain a Grammar which generates the
language
L a n b n1 :n  0
Solution
With L  a n b n :n  0, the grammar
G  {S }, {a, b}, S, P 

with production rules S  aSb, S .


Therefore L  a n b n1 :n  0 is obtained by generating an
extra b. This is done with a production rule
S  Ab.
Hence the grammar G is given by:
G  {S, A}{a, b}, S, P  with production rules given by
S  Ab,
A
aAb A


Ì Example 1.1.13: Obtain the language L produced by G with


production rules
S  SS,
S 
S  aSb
S  bSa

Solution
It is known from the given production rules that G has equal
number of a’s and b’s.
If w starts with an ‘a’ and ends with a ‘b’, then w  L has the
form w  a w1 b where w1  L.
If w starts with a ‘b’ and ends with an ‘a’ then w  L has the
form w  b w1 a where w1  L.

Compiled by Yeshanew A.
pg. 15
As a string in L can begin and end with the same symbol, the string
shoud be of the form
w = w1 w2
where w1 and w2 are in L, produced by S
 SS. This generates the language
L  w : na (w)  nb (w)
where na(w) and nb(w) denotes the number of a’s and number of b’s
in the string w, respectively.
Ì Example 1.1.14: Given G1  {A, S }, {a, b}, S, P1  with P1
defined by the production rules
S  aAb| 
A  aAb| 
show that L(G1 )  a n b n : n  0.
Also show that G1 is equivalent to G where G  {S }, {a, b}, S, P 
where P is given by
S  aSb
S .

Solution
Given P1 as
S  aAb
S 
S  aAb
A .
S  produces a string with zero length. (n = 0)
S  aAb S  aAb
 ab  aaAbb
 ab  aabb
 a 2 b2and so on
Therefore L(G1) = {an bn : n  0}.
Given G  {S }, {a , b}, S, P  where P is S 
aSb, S . The rule S  aSb is recursive.
All sentential forms will have the forms
wi  a i S bi

Compiled by Yeshanew A.
pg. 16
Applying the production rule S  aSb, we get
a i S b i  a i 1 S bi1

This is true for all i. In order to get a sentence we apply S .


Therefore we get
* n n n n
Sa Sb a b
Hence L(G)  a n b n : n  0 
Hence G1 is equivalent to G as both the grammars are given by
a

n n
b
:n0 .

Example 1.1.15:Given a grammarGdefined by the


production rules
S  AB
A  Aa
B  Bb
Aa
B  b.
Show that the word w  a 2 b4  L(G),
where L is a language determined by G.
Solution
S  AB
Þ AaB
Þ aaB
Þ aaBb
Þ aaBbb
Þ aaBbbb
Þ aabbbb
Þ a 2 b4
Hence the word w  a 2 b4  L(G).

Ì Example 1.1.16: Find grammars for  {a , b} that


generate the sets of
all strings with exactly one ‘a’
all strings with at least one ‘a’
all strings with no more than three a’s.

Compiled by Yeshanew A.
pg. 17
Solution
(a) Given  {a , b}
We are able to write the grammar G which produces all
strings with exactly one ‘a’ whose production rules are
S  aSb
S  Sb
S 
(b) For all strings with at least one ‘a’: Production rules of
Grammar C are
S  aSb
S  bSa
S 

(c) For all strings with no more than three a’s


L a n
b m n  3, m  0 
with production rules
A  aSb
S  aBb
B  aCb
C  bC
C  b | .

Ì Example 1.1.17: Give a simple description of the language


generated by the grammar with productions
( a ) S  aA, ( b ) S  Aa,
A  bS , AB
S  B  Aa.

Soution
(a) For the given production rules
S  aA
A  bS
S 
we have the language L given by
L a n b n n 1 
(b) For the given production rules
S  Aa

AB
B  Aa

There is no language L produced as there is no proper termination

Compiled by Yeshanew A.
pg. 18
1.1.8 Types of Grammar:

1. Type -0: Unrestricted grammar


2. Type -1: Context-sensitive grammar
3. Type-2: Known as context free grammar
4. Type-3- Known as regular language

Fig 6. Chomsky hierarchy of grammar.

Fig 7: The language Hierarchy

Compiled by Yeshanew A. pg. 19

You might also like