Lex 3

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 11

Automating Scanner Construction

RENFA (Thompson’s construction) 


• Build an NFA for each term
• Combine them with -moves
The Cycle of Constructions
NFA DFA (subset construction) 
• Build the simulation
minimal
RE NFA DFA
DFA Minimal DFA (today) DFA

• Hopcroft’s algorithm

DFA RE
• All pairs, all paths problem
• Union together paths from s0 to a final state
from Cooper & Torczon 1
DFA Minimization

The Big Picture


• Discover sets of equivalent states
• Represent each such set with just one state

Two states are equivalent if and only if:


• The set of paths leading to them are equivalent
    , transitions on  lead to equivalent states (DFA)
• transitions to distinct sets  states must be in distinct sets

A partition P of S
• Each s  S is in exactly one set pi  P
• The algorithm iteratively partitions the DFA’s states

from Cooper & Torczon 2


DFA Minimization

Details of the algorithm


• Group states into maximal size sets, optimistically
• Iteratively subdivide those sets, as needed
• States that remain grouped together are equivalent

Initial partition, P0 , has two sets {F} & {Q-F} (D =(Q,,,q0,F))

Splitting a set
• Assume qa, qb, & qc  s, and
 (qa,a) = qx, (qb,a) = qy, & (qa,a) = qz
• If qx, qy, & qz are not in the same set, then s must be split
• One state in the final DFA cannot have two transitions on a

from Cooper & Torczon 3


DFA Minimization

The algorithm Why does this work?


• Partition P  2Q
P  { F, {Q-F}} • Start off with 2 subsets of Q
while ( P is still changing) {F} and {Q-F}
T{}
• While loop takes PiPi+1 by
for each set s  P
splitting 1 or more sets
for each   
partition s by  • Pi+1 is at least one step closer to the
into s1, s2, …, sk partition with |Q | sets
T  T  s1, s2, …, sk • Maximum of |Q | splits
if T  P then Note that
PT
• Partitions are never combined
This is a fixed-point algorithm! • Initial partition ensures that final
states are intact

from Cooper & Torczon 4


DFA Minimization

Enough theory, does this stuff work?


 Recall our example: ( a | b)* abb

Curren t Pa rt ition S p lit on a S p lit on b


P0 {s 4 } {s 0 , s 1 , s 2 , s 3 } no ne {s 0 , s 1 , s 2 } {s 3 }

P1 {s 4 }{s 3 }{s 0 , s 1 , s 2 } no ne {s 0 , s 2 }{s 1 }

P2 {s 4 }{s 3 }{s 1 }{s 0 , s 2 } no ne Non e

final state
a a b a
a
s0 a s1 b s3 b s4 a b b
s 0 , s2 s1 s3 s4

b a a b a b
s2

from Cooper & Torczon 5


DFA Minimization

What about a ( b | c )* ? 
b q5
 q4 
q0 a q1  q2  q3 q8  q9
 q6 c q7 

First, the subset construction:


-c los ure (m o ve( s , *)) b
NFA s ta te s a b c
s2
s0 q0 q 1, q 2, q 3, no ne no ne b
q 4, q 6, q 9 a
s1 q 1, q 2, q 3, no ne q 5, q 8, q 9, q 7, q 8, q 9, s0 s1 b c
q 4, q 6, q 9 q 3, q 4, q 6 q 3, q 4, q 6 c
s2 q 5, q 8, q 9, no ne s2 s3 s3
q 3, q 4, q 6
s3 q 7, q 8, q 9, no ne s2 s3 c
q 3, q 4, q 6

Final states
from Cooper & Torczon 6
DFA Minimization

Then, apply the minimization algorithm b

S p lit on s2
b
Cu rren t Pa rt ition a b c
a
s0 s1 b c
P0 { s 1 , s 2 , s 3 } {s 0 } n o ne n o ne n o ne
c
s3
final states c
To produce the minimal DFA

b|c In lecture 6, I said that a human would design a


simpler automaton than Thompson’s
a construction did.
s0 s1
The algorithms produce that same DFA!

from Cooper & Torczon 7


Limits of Regular Languages

Advantages of Regular Expressions


• Simple & powerful notation for specifying patterns
• Automatic construction of fast recognizers
• Many kinds of syntax can be specified with REs

Example — an expression grammar


Term  [a-zA-Z] ([a-zA-z] | [0-9])*
Op  +|-||/
Expr  ( Term Op )* Term
Of course, this would generate a DFA …

If REs are so useful …


Why not use them for everything?

from Cooper & Torczon 8


Limits of Regular Languages

Not all languages are regular


RL’s  CFL’s  CSL’s

You cannot construct DFA’s to recognize these languages


• L = { p k qk } (parenthesis languages)
• L = { wcw r | w  *}
Neither of these is a regular language (nor an RE)

But, this is a little subtle. You can construct DFA’s for


• Alternating 0’s and 1’s
(  | 1)( 0 | 1) ( 0 | )
• Sets of pairs of 0’s and 1’s
( 01 | 10 ) ( 01 | 10 )*
RE’s can count bounded sets and bounded differences

from Cooper & Torczon 9


What can be so hard?

Poor language design can complicate scanning


• Reserved words are important
if then then then = else; else else = then (PL/I)

• Significant blanks (Fortran & Algol68)


do 10 i = 1,25
do 10 i = 1.25

• String constants with special characters (C, others)


newline, tab, quote, comment delimiters, …

• Finite closures
 Limited identifier length
 Adds states to count length

from Cooper & Torczon 10


What can be so hard? (Fortran 66/77)
INTEGERFUNC TIONA
P ARAMETER(A=6 ,B=2 )
IMPLICIT CHARA CTER*(A-B)(A-B)
INTEGER FORMAT(1 0), IF(1 0), DO9 E1
10 0 FOR MAT (4H) =(3)
20 0 FOR MAT (4 )=(3 )
DO9 E1=1
How does a compiler do this?
DO9 E1=1,2 • First pass finds & inserts blanks
9 IF(X)=1 • Can add extra words or tags to
IF(X)H=1 create a scanable language
IF(X)3 00 ,2 00 • Second pass is normal scanner
30 0 CONTINUE
END
C THIS IS A “COMMENT CARD”
$ FILE(1)
END Example due to Dr. F.K. Zadeck

from Cooper & Torczon 11

You might also like