Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 68

Theory of Computation

Shakir Al Faraji
Computer Science Dept.,
Petra University
Amman - Jordan.
email: shussain@uop.edu.jo
Thank you,
Dr. Shakir Al Faraji

Shakir

IMPORTANT NOTES

Students
This presentation is designed to be used in
class as part of a guided discovery
sequence. It is not self-explanatory! Please
use it only for revision purposes after having
taken the class. Simply going through the
slides will teach you nothing. You must be
actively thinking, doing and questioning to
learn!

Thank you,
Dr. Shakir Al Faraji

Shakir.

Course Strategy
Be

Warned: This is not a course


that spoon-feeds students.
Students are expected to be
investigative and resourceful.
Reading books and other research
of topics are expected.
Thank you,
Dr. Shakir Al Faraji

Shakir.

Material
There

is a book:

Hopcroft, Rajeev,
& Ullman 3ed Edition
(2007), Addison Wesley
These were the lecture notes.
Well, apart from the slides.
Thank you,
Dr. Shakir Al Faraji

Shakir.

Regular Expression

Dr. Shakir Al Faraji

Definition
A regular expression, or RE,
describes strings of characters
(words or phrases or any
arbitrary text). It's a pattern that
matches certain strings and
doesn't match others. A regular

expression is a set of characters that


specify a pattern.
OR Language defining symbols.
Dr. Shakir Al Faraji

Definition-Cont
Regular expressions are used to
generate patterns of strings. A
regular expression is an algebraic
formula whose value is a pattern
consisting of a set of strings,
called the language of the
expression.
Dr. Shakir Al Faraji

Operands in a regular
expression
Operands in a regular expression can be:
characters from the alphabet over which
the regular expression is defined.
variables whose values are any pattern
defined by a regular expression.
epsilon which denotes the empty string
containing no characters.
null which denotes the empty set of
strings.
Dr. Shakir Al Faraji

Operators used in
regular expressions
Union: If R1 and R2 are regular
expressions, then R1 | R2 (also written as
R1 U R2 or R1 + R2) is also a regular
expression.
L(R1|R2) = L(R1) U L(R2).
Concatenation: If R1 and R2 are regular
expressions, then R1R2 (also written as
R1.R2) is also a regular expression.
L(R1R2) = L(R1) concatenated with L(R2).
Dr. Shakir Al Faraji

Operators used in
regular expressions
Kleene closure: If R1 is a regular
expression, then R1* (the Kleene closure
of R1) is also a regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...

Closure has the highest precedence,


followed by concatenation, followed by
union.
Dr. Shakir Al Faraji

Examples
The set of strings over {0,1} that end in
3 consecutive 1's.
(0 | 1)* 111
OR (0 + 1)* 111
The set of strings over {0,1} that have
at least one 1.
0* 1 (0 + 1) *
Dr. Shakir Al Faraji

Examples-Cont.
The set of strings over {0,1} that have at
most one 1.
0* | 0* 1 0*
The set of strings over {A..Z,a..z} that
contain the word "main".
Let <letter> = A | B | ... | Z | a | b | ... | z
<letter>* main <letter>*
Dr. Shakir Al Faraji

Examples-Cont.
The set of strings over {A..Z,a..z} that
contain 3 x's.
<letter>* x <letter>* x <letter>* x <letter>*

Dr. Shakir Al Faraji

Examples-Cont.
The set of identifiers in Pascal.
Let <letter> = A | B | ... | Z | a | b | ... | z
Let <digit> = 0 | 1 | 2 | 3 ... | 9
<letter> (<letter> | <digit>)*

Dr. Shakir Al Faraji

Examples-Cont.
The set of real numbers in Pascal.
Let <digit> = 0 | 1 | 2 | 3 ... | 9
Let <exp> = 'E' <sign> <digit> <digit>* | epsilon
Let <sign> = '+' | '-' | epsilon
Let <decimal> = '.' <digit> <digit>* | epsilon

<digit> <digit>* <decimal> <exp>

Dr. Shakir Al Faraji

Examples-Cont.
Consider = { a }
L is a language that each word is
of odd length
a (aa)*

Dr. Shakir Al Faraji

Examples-Cont.
Consider = { a, b }
L is a language that each word
must start with the letter b
b (a+b)*

Dr. Shakir Al Faraji

Examples-Cont.
Consider = { a, b }
L is a language that each word
must start with the letter b and
end with the letter a

b (a+b)* a

Dr. Shakir Al Faraji

Examples-Cont.
Consider = { a, b, c }
L = { a, c, ab, cb, abb, cbb,
abbb,
cbbb, abbbb, cbbbb . . . }

L - language ((a+c) b*)

Dr. Shakir Al Faraji

Examples-Cont.
Consider (a+b)*a(a+b)*
L = language of all words over the
= { a, b } that have an a in them
L = { a, aa, ba, aab, aba, baa,
bba,
aaaa, aaba, abaa . . . }

Dr. Shakir Al Faraji

Examples-Cont.
Consider the following RE
(a+b)* a(a+b)* a(a+b)*
L = language of all words over the
= { a, b } that have at least two
as in them

Dr. Shakir Al Faraji

Examples-Cont.
(a+b)* a(a+b)* a(a+b)* = b*ab*a(a+b)*

?
(a+b)* a(a+b)* a(a+b)* = (a+b)*ab*ab*

?
(a+b)* a(a+b)* a(a+b)* = b*a(a+b)*ab*

Dr. Shakir Al Faraji

Examples-Cont.
(a+b)* a(a+b)* b(a+b)*

Dr. Shakir Al Faraji

Examples-Cont.
(a+b)* a(a+b)* b(a+b)*

?
Language of all words that have at
least one a and at least one b !!!!

Dr. Shakir Al Faraji

Examples-Cont.
(a+b)* a(a+b)* b(a+b)*
What about the word ba !!!!

Dr. Shakir Al Faraji

Examples-Cont.
(a+b)* a(a+b)* b(a+b)*
What about the word ba !!!!
MUST BE
(a+b)* a(a+b)* b(a+b)* + (a+b)* b(a+b)* a(a+b)*

Dr. Shakir Al Faraji

Examples-Cont.
IS
(a+b)* a(a+b)* b(a+b)* + (a+b)* b(a+b)* a(a+b)*

SAME AS
(a+b)* a(a+b)* b(a+b)* + bb*aa*

Dr. Shakir Al Faraji

Examples-Cont.
b* + ab*
( + a )b*
b* + ab* = ( + a )b*

Dr. Shakir Al Faraji

More on RE.
Definition
If S and T are sets of strings of letters, we
define the product set of strings of letters to
be
ST = { all combinations of a string from S
concatenated with a string from T }

Dr. Shakir Al Faraji

More on RE - Cont.
If S = { a, aa, aaa } , T = { bb, bbb }
then
ST = { abb, abbb, aabb, aabbb, aaabb, aaabbb }

Dr. Shakir Al Faraji

Languages Associated
with RE.
Definition

language associated with the RE just is a single


letter is that one-letter word alone and the language
associated with is just {}, a one-word language.
if r1 is a RE associated with the language L1 and r2
is a RE associated with the language L2 then:
the RE (r1)(r2) is associated with the product L1L2
RE (r1+r2) is associated with the language formed by
the union of sets L1 and L2
RE (r1)* is associated with the language L1*

Dr. Shakir Al Faraji

Finite Languages are


Regular.
Theorem
If L is a finite language, then L can be defined
by a regular expression. In other words, all
finite languages are regular.
Proof
Let L = { aa, ab, ba, bb }
RE is
aa+ab+ba+bb
(a+b)(a+b)
Dr. Shakir Al Faraji

Examples.
Can you describe the following RE
(a+b)* (aa+bb) (a+b)*

Dr. Shakir Al Faraji

Examples.
Can you describe the following RE
(a+b)* (aa+bb) (a+b)*
All strings of as and bs that at some point
contain a double letter.

Dr. Shakir Al Faraji

Examples.
= { a, b }
What strings do not contain a double
letter?

Dr. Shakir Al Faraji

Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*

Dr. Shakir Al Faraji

Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*
Is it correct ?

Dr. Shakir Al Faraji

Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*
Is it correct ? NO

Dr. Shakir Al Faraji

Examples.
= { a, b }
What strings do not contain a double
letter?
(ab)*
Is it correct ? NO
( +b)(ab)*( +a)
Dr. Shakir Al Faraji

Examples.
( a + b* )* = (a + b )*

( aa + ab*)* = (aa + ab)*


( a* b* )* = (a + b )*

Dr. Shakir Al Faraji

Examples.
( a + b* )* = (a + b )*

YES

( aa + ab*)* = (aa + ab)*


( a* b* )* = (a + b )*

YES

Dr. Shakir Al Faraji

NO

Examples.
[aa + bb + ( ab + ba) (aa+bb)*(ab + ba) ]*

Dr. Shakir Al Faraji

Examples.
[aa + bb + ( ab + ba) (aa+bb)*(ab + ba) ]*
EVEN-EVEN

type1 = aa
type2 = bb
type3 = (ab+ba)(aa+bb)*(ab+ba)
E = [ type1 + type2 + type3 ] *

Dr. Shakir Al Faraji

What Regular Expressions Are


Exactly - Terminology
Basically, a regular expression is a
pattern describing a certain amount
of text. Their name comes from the
mathematical theory on which they
are based. But we will not dig into
that. Since most people including
myself are lazy to type, you will
usually find the name abbreviated to
regex or regexp.
Dr. Shakir Al Faraji

What Regular Expressions Are


Exactly Cont.
This first example is actually a perfectly
valid regex. It is the most basic pattern,
simply matching the literal text regex. A
"match" is the piece of text, or sequence of
bytes or characters that pattern was found
to correspond to by the regex processing
software. Matches are highlighted in blue on
this site.
Dr. Shakir Al Faraji

What Regular Expressions Are


Exactly Cont.

b[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}\b

Dr. Shakir Al Faraji

What Regular Expressions Are


Exactly Cont.
b[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}\b

is a more complex pattern. It describes a


series of letters, digits, dots, percentage
signs and underscores, followed by an at
sign, followed by another series of letters,
digits, dots, percentage signs and
underscores, finally followed by a single dot
and between two and four letters. In other
words: this pattern describes an email
address.
Dr. Shakir Al Faraji

What Regular Expressions Are


Exactly Cont.
With the above regular expression pattern,
you can search through a text file to find
email addresses, or verify if a given string
looks like an email address. In this tutorial, I
will use the term "string" to indicate the text
that I am applying the regular expression to.

Dr. Shakir Al Faraji

What Regular Expressions Are


Exactly Cont.
The term "string" or "character string" is
used by programmers to indicate a
sequence of characters. In practice, you
can use regular expressions with
whatever data you can access using the
application or programming language you
are working with.

Dr. Shakir Al Faraji

What Regular Expressions Are


Exactly Cont.
A regular expression uses metacharacters
(characters that assume special meaning for
matching other characters) such as *, [ ], $
and .. For example, the RE [Hh]ello!* would
match Hello and hello and Hello! (and
hello!!!!!). The RE [Hh](ello|i)!* would match
Hello and Hi and Hi! (and so on). A backslash
(\) disables the special meaning of the
following character, so you could match the
string [Hello] with the RE \[Hello\].
Dr. Shakir Al Faraji

How can I use regular


expressions?
Many text editors allow regular-expression
search-and-replace. EditPlus for Windows
has this capability, as does BBEdit for the
Macintosh.
EditPlus
The EditPlus search-replace window has a
checkbox called Regular expression. To
use regular expressions in your search,
simply check this box.

Dr. Shakir Al Faraji

How can I use regular


expressions? Cont.
BBEdit
BBEdits search-replace window also has
such a checkbox; its label, however, is
Use grep.
Grep. What an odd term. The word grep is from
the creators of the UNIX operating system, some
of the first implementers of regular expressions.
UNIX programmers delighted in reducing long
commands to meaningless acronyms; grep is
said to have meant general regular expression
print.
Dr. Shakir Al Faraji

Defining regular expression


patterns
The way regular-expression patterns work
is by creating a special little language in
which ordinary symbols take on special
meanings. This guide will go through the
special meanings little by little, with
examples. You will get the most from this
guide if you read all the way through it. To
be sure you do, Ive left a very important
piece of informationhow to replace what
you findfor the end.
Dr. Shakir Al Faraji

Defining regular expression


patterns-Cont
Dot, question mark, star, plus, and
backslash
Imagine that you have a book of letters, and
you need to tag all the salutations.
Salutations fall into a pattern: the word
Dear, a name, and a colon (or possibly a
comma, but well stick with a colon for
now). Obviously, the problem with finding
this via ordinary search is that the name
could be anything.
Dr. Shakir Al Faraji

Defining regular expression


patterns-Cont
Regular expressions have a way of saying
any character: the dot, or period. To
find a three-letter word beginning and
ending with b, for example, you could
search on b.b . This would find bib
or bob or bub, but not bud or
dub or bulb.

Dr. Shakir Al Faraji

Defining regular expression


patterns-Cont
Note that whitespace characters such as
space or tab can also be located by the dot.
So b.b would find words with a space or
tab between them. (Quick exercise: where
would b.b match in the preceding
sentence? There are two possibilities!)
The hard return, however, is not matched
by a dot; more on hard returns later.
This still wont solve our salutation problem,
though: names are made up out of a variable
number of letters, not just one.
Dr. Shakir Al Faraji

Defining regular expression


patterns-Cont
Regular expressions have several ways to
say not just one: the question mark (?),
the star or asterisk (*), and the plus (+).
The question mark means zero or one,
the star means zero or more, and the
plus means one or more. These marks
are like adjectives; they modify other
characters. Whats more, theyre like
adjectives in some foreign languages, in
that they come immediately after the
character they modify.
Dr. Shakir Al Faraji

Defining regular expression


patterns-Cont
Regular expressions have several ways to
say not just one: the question mark (?),
the star or asterisk (*), and the plus (+).
The question mark means zero or one,
the star means zero or more, and the
plus means one or more. These marks
are like adjectives; they modify other
characters. Whats more, theyre like
adjectives in some foreign languages, in
that they come immediately after the
character they modify.
Dr. Shakir Al Faraji

Defining regular expression


patterns-Cont
So the regular expression Ba? will match B or
Ba but not Baa or a. The regular expression
Ba* will match B or Ba or Baa, up to any
number of as. The regular expression
Ba+ will match Ba or Baa and on up, but it
will not match B by itself, since the plus
sign demands at least one a.
Combining the dot with the plus or star
solves our salutation problem. The regular
expression Dear .+: will find any
imaginable business-letter salutation.
Dr. Shakir Al Faraji

Defining regular expression


patterns-Cont
But what if you actually want to look for a
dot, a star, a question mark, or a plus?
How can you find them, if theyve got
special meanings?
Any special regular-expression character loses its
special meaning if there is a backslash (\) before
it. So \. will find a real period, like the one at the
end of this sentence. The backslash works on
itself, too; to find a real backslash, put \\ in your
search.
Dr. Shakir Al Faraji

What weve learned so far


(Metacharacters)
Character
Regular-expression meaning
.
Any character, including space or tab
?
Zero or one of the preceding character
*
Zero or more of the preceding
character
+
One or more of the preceding
character
\
Negates the special meaning of the
following character
Dr. Shakir Al Faraji

Metacharacters
As youve learned, the backslash negates any
special meaning that the character
following it has to a regular expression. It
has another function, too: it can turn
ordinary characters into special ones.
Consider the tab. You dont see it on the
screen the way you see ordinary letters;
you see what it does.

Dr. Shakir Al Faraji

Metacharacters-Cont.
If you turn on the show-invisibles function,
however, you generally see an indication
that there is a character there.
Regular expressions let you access these
invisible characters (usually called
metacharacters):

Dr. Shakir Al Faraji

Metacharacters-Cont.
Metacharacter
Meaning
\n
Newline (or paragraph
mark, or however you
think of it)
\t
Tab character
\s
Any whitespace
character (tab, space, or
newline)

Dr. Shakir Al Faraji

Metacharacters-Cont.
For purposes of modifiers like star and
plus, these metacharacters act like single
characters. So \n+ finds one or more
newlines.
A special caution with BBEdit: Because of
ancient OS wars, Macs and non-Macs
treat newlines differently. If a regular
expression containing \n isnt finding
what you think it should, try replacing \n
in your search pattern with \r.
Dr. Shakir Al Faraji

Metacharacters-Cont.
Depending on your regular-expression
engine or editing program, there may be
other metacharacters available to you.
Read the manual or help pages for details.
In addition, a few more special regularexpression characters provide useful
functions. Remember that to look for the
actual character, you must precede it with
a backslash.
Dr. Shakir Al Faraji

Metacharacters-Cont.
Depending on your regular-expression
engine or editing program, there may be
other metacharacters available to you.
Read the manual or help pages for details.
In addition, a few more special regularexpression characters provide useful
functions. Remember that to look for the
actual character, you must precede it with
a backslash.
Dr. Shakir Al Faraji

END

Dr. Shakir Al Faraji

You might also like