Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Regular Expressions

Regular Expression (RE)

Regular expression: An algebraic way to describe


regular languages.

Many of today's programming languages use regular


expressions to match patterns in strings.
E.g., awk, flex, lex, java, javascript, perl, python

Used for searching texts in UNIX (vi, Perl, Emacs,


grep), Microsoft Word (version 6 and beyond),
and WordPerfect.

Few Web search engines may allow the use of


Regular Expressions
Recursive Definition
Primitive regular expressions:  , , 

Given regular expressions r1 and r2

r1  r2
r1  r2
Are regular expressions
r1 *
r1 
Examples

A regular expression: a  b  c  * (c   )

Not a regular expression: a  b  


Languages of Regular Expressions

L r  : language of regular expression r

Example

L ( a  b  c ) *   , a , bc , aa , abc , bca ,...


Definition

For primitive regular expressions:

L    

L     

L a   a
Definition (continued)

For regular expressions r1 and r2

L r1  r2   L r1   L r2 

L r1  r2   L r1  L r2 

L r1 *   L r1  *

L r1   L r1 
Example
Regular expression: a  b   a *

L a  b   a *  L a  b  L a *
 L  a  b  L  a *
  L a   L b   L a  *
 a  b a *
 a , b , a , aa , aaa ,...
 a , aa , aaa ,..., b, ba , baa ,...
Regular Expressions

Operator Precedence:
Highest: Kleene Closure
Then: Concatenation
Lowest: Union
Example

Regular expression r  a  b  * a  bb 

L r   a , bb , aa , abb , ba , bbb ,...


Example

Regular expression r  aa  * bb  * b

L r   {a b
2n 2m
b: n, m  0}
Example

Regular expression r  ( 0  1) * 00 ( 0  1) *

L (r ) = { all strings containing substring 00 }


Example

Regular expression r  (1  01) * (0   )

L (r ) = { all strings without substring 00 }


Regular Expressions
Equivalent Regular Expressions

Definition:

Regular expressions r1 and r2

are equivalent if L ( r1 )  L ( r2 )
Example
L = { all strings without substring 00 }

r1  (1  01) * (0   )

r2  (1 * 011*) * (0   )  1 * (0   )

r1 and r2
L ( r1 )  L ( r2 )  L
are equivalent
regular expressions
Regular Expression: The IEEE POSIX standard
Regular Expressions
Valid Email Addresses
Valid IP Addresses
Valid Dates
Floating Point Numbers
Variables
Integers
Numeric Values
Naming Regular Expressions
Specifying Tokens
RE specification of initial MiniJava lexical
structure
Regular Expressions
and
Regular Languages
Theorem

Languages
Generated by
Regular Expressions
 Regular
Languages

Theorem (Kleene 1956):


Proof:


Languages
Regular
Generated by
Languages
Regular Expressions


Languages
Regular
Generated by
Languages
Regular Expressions
Proof - Part 1


Languages
Regular
Generated by
Languages
Regular Expressions

For any regular expression r


the language L (r ) is regular

Proof by induction on the size of r


Induction Basis
Primitive Regular Expressions:  , , 
Corresponding
NFAs
L ( M 1 )    L ( )

regular
L ( M 2 )  { }  L (  )
languages
a
L ( M 3 )  {a}  L ( a )
Inductive Hypothesis

Suppose
that for regular expressions r1 and r2 ,
L ( r1 ) and L ( r2 ) are regular languages
Inductive Step
We will prove:
L  r1  r2 

L  r1  r2 
Are regular
Languages
L  r1 *

L  r1 
By definition of regular expressions:

L r1  r2   L r1   L r2 

L r1  r2   L  r1  L r2 

L r1 *   L r1  *

L r1   L r1 
By inductive hypothesis we know:
L ( r1 ) and L ( r2 ) are regular languages

We also know:
Regular languages are closed under:
Union L  r1   L  r2 
Concatenation L  r1  L  r2 

Star  L r1  *
Therefore:

L r1  r2   L r1   L r2 

Are regular
L r1  r2   L r1  L r2 
languages

L r1 *   L r1  *

L((r1 ))  L(r1 ) is trivially a regular language


(by induction hypothesis)
End of Proof-Part 1
Proof - Part 2


Languages
Regular
Generated by
Languages
Regular Expressions

For any regular language L there is


a regular expression r with L ( r )  L

We will convert an NFA that accepts L


to a regular expression
Since L is regular, there is a
NFA M that accepts it

L(M )  L

Take it with a single final state


From M construct the equivalent
Generalized Transition Graph
in which transition labels are regular expressions

Example: Corresponding
M Generalized transition graph

a c a c
a, b ab
b b
Another Example:
a
q0 q1 a, b q2
b

Transition labels b b
are regular a
expressions q0 q1 a  b q 2
b
b b
Reducing the states:
a
q0 q1 a  b q 2
b

Transition labels
are regular bb * a b
expressions
q0 bb * ( a  b ) q2
Resulting Regular Expression:

bb * a b

q0 bb * ( a  b ) q2

r  (bb * a ) * bb * ( a  b )b *

L(r )  L( M )  L
In General
Removing a state: e
d c
qi q qj
a b

ae * d ce * b
ce * d
qi qj
ae * b
By repeating the process until
two states are left, the resulting graph is
Initial graph Resulting graph
r1 r4
r3
q0 qf
r2
The resulting regular expression:
r  r1 * r2 ( r4  r3r1 * r2 ) *
L(r )  L( M )  L
End of Proof-Part 2
Standard Representations
of Regular Languages

Regular Languages

DFAs

Regular
NFAs
Expressions
When we say: We are given
a Regular Language L

We mean: Language L is in a standard


representation

(DFA, NFA, or Regular Expression)

You might also like