5a_Regular_Expressions

Regular Expressions
Regular Expression (RE)
Regular expression: An algebraic way to describe

regular languages.
Many of today's programming languages use regular

expressions to match patterns in strings.
E.g., awk, flex, lex, java, javascript, perl, python
Used for searching texts in UNIX (vi, Perl, Emacs,

grep), Microsoft Word (version 6 and beyond),
and WordPerfect.
Few Web search engines may allow the use of

Regular Expressions
Recursive Definition
Primitive regular expressions:  , , 
Given regular expressions r1 and r2
r1  r2
r1  r2
Are regular expressions
r1 *
r1 
Examples
A regular expression: a  b  c  * (c   )
Not a regular expression: a  b  

Languages of Regular Expressions
L r  : language of regular expression r
Example
L ( a  b  c ) *   , a , bc , aa , abc , bca ,...

Definition
For primitive regular expressions:
L    
L     
L a   a
Definition (continued)
For regular expressions r1 and r2
L r1  r2   L r1   L r2 
L r1  r2   L r1  L r2 
L r1 *   L r1  *
L r1   L r1 
Example
Regular expression: a  b   a *
L a  b   a *  L a  b  L a *
 L  a  b  L  a *
  L a   L b   L a  *
 a  b a *
 a , b , a , aa , aaa ,...
 a , aa , aaa ,..., b, ba , baa ,...
Regular Expressions
Operator Precedence:
Highest: Kleene Closure
Then: Concatenation
Lowest: Union
Example
Regular expression r  a  b  * a  bb 
L r   a , bb , aa , abb , ba , bbb ,...

Example
Regular expression r  aa  * bb  * b
L r   {a b
2n 2m
b: n, m  0}
Example
Regular expression r  ( 0  1) * 00 ( 0  1) *
L (r ) = { all strings containing substring 00 }

Example
Regular expression r  (1  01) * (0   )
L (r ) = { all strings without substring 00 }

Regular Expressions
Equivalent Regular Expressions
Definition:
Regular expressions r1 and r2
are equivalent if L ( r1 )  L ( r2 )
Example
L = { all strings without substring 00 }
r1  (1  01) * (0   )
r2  (1 * 011*) * (0   )  1 * (0   )
r1 and r2
L ( r1 )  L ( r2 )  L
are equivalent
regular expressions
Regular Expression: The IEEE POSIX standard
Regular Expressions
Valid Email Addresses
Valid IP Addresses
Valid Dates
Floating Point Numbers
Variables
Integers
Numeric Values
Naming Regular Expressions
Specifying Tokens
RE specification of initial MiniJava lexical
structure
Regular Expressions
and
Regular Languages
Theorem
Languages
Generated by
Regular Expressions
 Regular
Languages
Theorem (Kleene 1956):

Proof:

Languages
Regular
Generated by
Languages
Regular Expressions

Languages
Regular
Generated by
Languages
Regular Expressions
Proof - Part 1

Languages
Regular
Generated by
Languages
Regular Expressions
For any regular expression r

the language L (r ) is regular
Proof by induction on the size of r

Induction Basis
Primitive Regular Expressions:  , , 
Corresponding
NFAs
L ( M 1 )    L ( )
regular
L ( M 2 )  { }  L (  )
languages
a
L ( M 3 )  {a}  L ( a )
Inductive Hypothesis
Suppose
that for regular expressions r1 and r2 ,
L ( r1 ) and L ( r2 ) are regular languages
Inductive Step
We will prove:
L  r1  r2 
L  r1  r2 
Are regular
Languages
L  r1 *
L  r1 
By definition of regular expressions:
L r1  r2   L r1   L r2 
L r1  r2   L  r1  L r2 
L r1 *   L r1  *
L r1   L r1 
By inductive hypothesis we know:
L ( r1 ) and L ( r2 ) are regular languages
We also know:
Regular languages are closed under:
Union L  r1   L  r2 
Concatenation L  r1  L  r2 
Star  L r1  *
Therefore:
L r1  r2   L r1   L r2 
Are regular
L r1  r2   L r1  L r2 
languages
L r1 *   L r1  *
L((r1 ))  L(r1 ) is trivially a regular language

(by induction hypothesis)
End of Proof-Part 1
Proof - Part 2

Languages
Regular
Generated by
Languages
Regular Expressions
For any regular language L there is

a regular expression r with L ( r )  L
We will convert an NFA that accepts L

to a regular expression
Since L is regular, there is a
NFA M that accepts it
L(M )  L
Take it with a single final state

From M construct the equivalent
Generalized Transition Graph
in which transition labels are regular expressions
Example: Corresponding
M Generalized transition graph
a c a c
a, b ab
b b
Another Example:
a
q0 q1 a, b q2
b
Transition labels b b
are regular a
expressions q0 q1 a  b q 2
b
b b
Reducing the states:
a
q0 q1 a  b q 2
b
Transition labels
are regular bb * a b
expressions
q0 bb * ( a  b ) q2
Resulting Regular Expression:
bb * a b
q0 bb * ( a  b ) q2
r  (bb * a ) * bb * ( a  b )b *
L(r )  L( M )  L
In General
Removing a state: e
d c
qi q qj
a b
ae * d ce * b
ce * d
qi qj
ae * b
By repeating the process until
two states are left, the resulting graph is
Initial graph Resulting graph
r1 r4
r3
q0 qf
r2
The resulting regular expression:
r  r1 * r2 ( r4  r3r1 * r2 ) *
L(r )  L( M )  L
End of Proof-Part 2
Standard Representations
of Regular Languages
Regular Languages
DFAs
Regular
NFAs
Expressions
When we say: We are given
a Regular Language L
We mean: Language L is in a standard

representation
(DFA, NFA, or Regular Expression)

5a_Regular_Expressions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5a_Regular_Expressions

Uploaded by

Copyright:

Available Formats

Regular Expressions

Regular Expression (RE)

Regular expression: An algebraic way to describe

Many of today's programming languages use regular

Used for searching texts in UNIX (vi, Perl, Emacs,

Few Web search engines may allow the use of

Given regular expressions r1 and r2

A regular expression: a  b  c  * (c   )

Not a regular expression: a  b  

L r  : language of regular expression r

L ( a  b  c ) *   , a , bc , aa , abc , bca ,...

For primitive regular expressions:

For regular expressions r1 and r2

L r1  r2   L r1   L r2 

L r1  r2   L r1  L r2 

L r   a , bb , aa , abb , ba , bbb ,...

Regular expression r  aa  * bb  * b

L (r ) = { all strings containing substring 00 }

Regular expression r  (1  01) * (0   )

L (r ) = { all strings without substring 00 }

Regular expressions r1 and r2

Theorem (Kleene 1956):

For any regular expression r

Proof by induction on the size of r

L r1  r2   L r1   L r2 

L r1  r2   L r1   L r2 

L((r1 ))  L(r1 ) is trivially a regular language

For any regular language L there is

We will convert an NFA that accepts L

Take it with a single final state

We mean: Language L is in a standard

(DFA, NFA, or Regular Expression)

You might also like