Ch3 - Regular Expression: Subhash Sagar Email

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Ch3 – Regular Expression

Subhash Sagar
Email: subhash.sagar@nu.edu.pk

1
Defining Languages using Regular Expressions

2
Regular Expression
 A compact, easy-to-read language description.

 Use operators to denote the language constructors


described earlier, to build “complex” languages from simple
ones.
Regular Expression
 As discussed earlier that a* generates
Λ, a, aa, aaa, …
and a+ generates a, aa, aaa, aaaa, …, so the language L1 = {Λ, a, aa,
aaa, …} and L2 = {a, aa, aaa, aaaa, …} can simply be expressed by a*
and a+, respectively.
a* and a+ are called the regular expressions (RE) for L1 and L2
respectively.
Note: a+, aa* and a*a generate L2.

4
Regular Expression
Definition: A regular expression over an alphabet Σ is recursively
defined as follows:
1. ø denotes language ø
2. ε denotes language {ε}
3. a denotes language {a}, for all a  Σ.
4. (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s.
5. (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s.
6. P* denotes L(P)*, where P is r.e.
Recursive definition of Regular Expression (RE)
Step 1: Every letter of Σ including Λ is a regular expression.
Step 2: If r1 and r2 are regular expressions then
1.(r1)
2.r1 r2
3.r1 + r2 and
4. r1*
are also regular expressions.
Step 3: Nothing else is a regular expression.

6
Precedence of Regular-Expression Operators
To prevent excessive parentheses, we assume left associativity, with the following
operator precedence hierarchy, from most to least binding: *, ·, +
 The star (*) operator is of highest precedence. That is, it applies only to the smallest
sequence of symbols to its left that is a well-formed regular expressions.
 After grouping all stars to their operands, we group concatenation (.) operators to
their operands.
 Finally, all unions (+) are grouped with their operands.
 Example:
– 01* + 1 = ( 0 . ((1)*) ) + 1

7
Algebraic Laws of Regular Expressions
 Commutative:
– E+F = F+E
 Associative:
– (E+F)+G = E+(F+G)
– (EF)G = E(FG)
 Identity:
– E+Φ = E
– E=E=E

 Annihilator:
– ΦE = EΦ = Φ
Algebraic Laws…
 Distributive:
– E(F+G) = EF + EG
– (F+G)E = FE+GE
 Idempotent: E + E = E
 Involving Kleene's closures:
– (E*)* = E*
– Φ* = 
– * = 
– E+ = EE*
Applications
 Web Search Engines
– filter input strings or to match parts of such strings.

 Lexical Analyzer
– Converting a statement into token using predefined Regular Expression.

10
Examples

A regular expression: a  b  c 
*
 (c   )

Not a regular expression: a  b  


Definition
For regular expressions:

L   
L    
La   a
Languages of Regular Expressions

Lr  : language of regular expression r

Example

 
L (a  b  c)   , a, bc, aa, abc, bca,...
*
Example

a  b a*
Regular expression:

La  b   a *  L  a  b  L a *

 
 La  b  L a*
 La   Lb  La 
*

 a b a


*

 a, b , a, aa, aaa,...


 a, aa, aaa,...,b, ba, baa,...
Example

r  a  b  a  bb 
*
Regular expression

Lr   a, bb, aa, abb, ba, bbb,...


Example

r  aa  bb  b
* *
Regular expression

Lr   {a b2n 2m
b : n, m  0}
Example

Regular expression r  (0  1) 00 (0  1)
* *

L(r ) = { all strings containing substring 00 }


Example

Regular expression r  (1  01) (0   )


*

L(r ) = { all strings without substring 00 }


Equivalent Regular Expressions

Definition:

Regular expressions r1 and r2 are equivalent if

L(r1)  L(r2 )
True or False?
Let R and S be two regular expressions. Then:

1. ((R*)*)* = R*?
2. (R+S)* = R* + S*?
3. (RS + R)* RS = (RR*S)*?
Example: how to use these regular expression properties
and language operators?
 L = { w | w is a binary string which does not contain two consecutive 0s or two consecutive
1s anywhere)

– E.g., w = 01010101 is in L, while w = 10010 is not in L


 Goal: Build a regular expression for L

 Four cases for w:


– Case A: w starts with 0 and |w| is even
– Case B: w starts with 1 and |w| is even
– Case C: w starts with 0 and |w| is odd
– Case D: w starts with 1 and |w| is odd
Example: (Cont.)
 Regular expression for the four cases:
– Case A: (01)*
– Case B: (10)*
– Case C: 0(10)*
– Case D: 1(01)*
 Since L is the union of all 4 cases:
– Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)*

 If we introduce  then the regular expression can be simplified to:


– Reg Exp for L = ( +1)(01)*( +0)

22
Regular Expressions
and
Regular Languages
Theorem

Languages
Generated by
Regular Expressions
 Regular
Languages
Proof:

Languages
Generated by  Regular
Languages
Regular Expressions

Languages
Generated by  Regular
Languages
Regular Expressions
Proof - Part 1

Languages
Generated by  Regular
Languages
Regular Expressions

For any regular expression r


the language L(r ) is regular
Proof by induction on the size of r
Inductive Step
For Regular Expressions r1 and r2 , L(r1 ) and L(r2 ) are regular languages,

Lr1  r2 
Lr1  r2  Are regular
Lr  1
* Languages

Lr1 

COSTAS BUSCH - LSU 27


By inductive hypothesis we know:
L(r1 ) and L(r2 ) are regular languages
We also know:
Regular languages are closed under:
Union Lr1   Lr2 
Concatenation Lr1  Lr2 
Star Lr1 
*

COSTAS BUSCH - LSU 28


Therefore:

Lr1  r2   Lr1   Lr2 

Lr1  r2   Lr1  Lr2  Are regular


languages

 
L r  Lr1 
1
* *

L((r1 ))  L(r1 ) is trivially a regular language


(by induction hypothesis)
COSTAS BUSCH - LSU 29
Proof - Part 2

Languages
Generated by  Regular
Languages
Regular Expressions

For any regular language L there is


a regular expression r with L(r )  L
Defining Languages using Regular Expressions
 Method-3 (Regular Expressions)
– Consider the language L={Λ, x, xx, xxx,…} of strings, defined over Σ =
{x}.
We can write this language as the Kleene star closure of alphabet Σ
or L=Σ*={x}*
this language can also be expressed by the regular expression x*.

– Similarly the language L={x, xx, xxx,…}, defined over Σ = {x}, can be
expressed by the regular expression x+.

31
Example
 Now consider another language L, consisting of all possible strings,
defined over Σ = {a, b}. This language can also be expressed by the
regular expression
(a + b)*

 Now consider another language L, of strings having exactly double


a, defined over Σ = {a, b}, then it’s regular expression may be
b*aab*

32
Example
 Now consider another language L, of even length, defined over Σ =
{a, b}, then it’s regular expression may be
((a+b)(a+b))*

 Now consider another language L, of odd length, defined over Σ =


{a, b}, then it’s regular expression may be
(a+b)((a+b)(a+b))* or
((a+b)(a+b))*(a+b)

33
Remark
 It may be noted that a language may be expressed by more than one
regular expressions, while given a regular expression there exist a
unique language generated by that regular expression.

34
Example
 Consider the language, defined over
Σ={a,b} of words having at least one a, may be expressed by a
regular expression
(a+b)*a(a+b)*

 Consider the language, defined over


Σ={a,b} of words having at least one a and one b, may be expressed
by a regular expression
(a+b)*a(a+b)*b(a+b)*+ (a+b)*b(a+b)*a(a+b)*

35
Example
 Consider the language, defined over
Σ={a, b}, of words starting with double a and ending in double b
then its regular expression may be
aa(a+b)*bb

 Consider the language, defined over


Σ={a, b} of words starting with a and ending in b OR starting with b
and ending in a, then its regular expression may be
a(a+b)*b+b(a+b)*a

36
Standard Representations
of Regular Languages
Regular Languages

DFAs

Regular Expressions
NFAs

COSTAS BUSCH - LSU 37

You might also like