Lex 3

Automating Scanner Construction
RENFA (Thompson’s construction) 

• Build an NFA for each term
• Combine them with -moves
The Cycle of Constructions
NFA DFA (subset construction) 
• Build the simulation
minimal
RE NFA DFA
DFA Minimal DFA (today) DFA
• Hopcroft’s algorithm
DFA RE
• All pairs, all paths problem
• Union together paths from s0 to a final state
from Cooper & Torczon 1
DFA Minimization
The Big Picture

• Discover sets of equivalent states
• Represent each such set with just one state
Two states are equivalent if and only if:

• The set of paths leading to them are equivalent
    , transitions on  lead to equivalent states (DFA)
• transitions to distinct sets  states must be in distinct sets
A partition P of S
• Each s  S is in exactly one set pi  P
• The algorithm iteratively partitions the DFA’s states

DFA Minimization
Details of the algorithm

• Group states into maximal size sets, optimistically
• Iteratively subdivide those sets, as needed
• States that remain grouped together are equivalent
Initial partition, P0 , has two sets {F} & {Q-F} (D =(Q,,,q0,F))
Splitting a set
• Assume qa, qb, & qc  s, and
 (qa,a) = qx, (qb,a) = qy, & (qa,a) = qz
• If qx, qy, & qz are not in the same set, then s must be split
• One state in the final DFA cannot have two transitions on a

DFA Minimization
The algorithm Why does this work?

• Partition P  2Q
P  { F, {Q-F}} • Start off with 2 subsets of Q
while ( P is still changing) {F} and {Q-F}
T{}
• While loop takes PiPi+1 by
for each set s  P
splitting 1 or more sets
for each   
partition s by  • Pi+1 is at least one step closer to the
into s1, s2, …, sk partition with |Q | sets
T  T  s1, s2, …, sk • Maximum of |Q | splits
if T  P then Note that
PT
• Partitions are never combined
This is a fixed-point algorithm! • Initial partition ensures that final
states are intact

DFA Minimization
Enough theory, does this stuff work?

 Recall our example: ( a | b)* abb
Curren t Pa rt ition S p lit on a S p lit on b

P0 {s 4 } {s 0 , s 1 , s 2 , s 3 } no ne {s 0 , s 1 , s 2 } {s 3 }
P1 {s 4 }{s 3 }{s 0 , s 1 , s 2 } no ne {s 0 , s 2 }{s 1 }
P2 {s 4 }{s 3 }{s 1 }{s 0 , s 2 } no ne Non e
final state
a a b a
a
s0 a s1 b s3 b s4 a b b
s 0 , s2 s1 s3 s4
b a a b a b
s2

DFA Minimization
What about a ( b | c )* ? 
b q5
 q4 
q0 a q1  q2  q3 q8  q9
 q6 c q7 

First, the subset construction:

-c los ure (m o ve( s , *)) b
NFA s ta te s a b c
s2
s0 q0 q 1, q 2, q 3, no ne no ne b
q 4, q 6, q 9 a
s1 q 1, q 2, q 3, no ne q 5, q 8, q 9, q 7, q 8, q 9, s0 s1 b c
q 4, q 6, q 9 q 3, q 4, q 6 q 3, q 4, q 6 c
s2 q 5, q 8, q 9, no ne s2 s3 s3
q 3, q 4, q 6
s3 q 7, q 8, q 9, no ne s2 s3 c
q 3, q 4, q 6
Final states
DFA Minimization
Then, apply the minimization algorithm b
S p lit on s2
b
Cu rren t Pa rt ition a b c
a
s0 s1 b c
P0 { s 1 , s 2 , s 3 } {s 0 } n o ne n o ne n o ne
c
s3
final states c
To produce the minimal DFA
b|c In lecture 6, I said that a human would design a

simpler automaton than Thompson’s
a construction did.
s0 s1
The algorithms produce that same DFA!

Limits of Regular Languages
Advantages of Regular Expressions

• Simple & powerful notation for specifying patterns
• Automatic construction of fast recognizers
• Many kinds of syntax can be specified with REs
Example — an expression grammar

Term  [a-zA-Z] ([a-zA-z] | [0-9])*
Op  +|-||/
Expr  ( Term Op )* Term
Of course, this would generate a DFA …
If REs are so useful …

Why not use them for everything?

Limits of Regular Languages
Not all languages are regular

RL’s  CFL’s  CSL’s
You cannot construct DFA’s to recognize these languages

• L = { p k qk } (parenthesis languages)
• L = { wcw r | w  *}
Neither of these is a regular language (nor an RE)
But, this is a little subtle. You can construct DFA’s for

• Alternating 0’s and 1’s
(  | 1)( 0 | 1) ( 0 | )
• Sets of pairs of 0’s and 1’s
( 01 | 10 ) ( 01 | 10 )*
RE’s can count bounded sets and bounded differences

What can be so hard?
Poor language design can complicate scanning

• Reserved words are important
if then then then = else; else else = then (PL/I)
• Significant blanks (Fortran & Algol68)

do 10 i = 1,25
do 10 i = 1.25
• String constants with special characters (C, others)

newline, tab, quote, comment delimiters, …
• Finite closures
 Limited identifier length
 Adds states to count length

What can be so hard? (Fortran 66/77)
INTEGERFUNC TIONA
P ARAMETER(A=6 ,B=2 )
IMPLICIT CHARA CTER*(A-B)(A-B)
INTEGER FORMAT(1 0), IF(1 0), DO9 E1
10 0 FOR MAT (4H) =(3)
20 0 FOR MAT (4 )=(3 )
DO9 E1=1
How does a compiler do this?
DO9 E1=1,2 • First pass finds & inserts blanks
9 IF(X)=1 • Can add extra words or tags to
IF(X)H=1 create a scanable language
IF(X)3 00 ,2 00 • Second pass is normal scanner
30 0 CONTINUE
END
C THIS IS A “COMMENT CARD”
$ FILE(1)
END Example due to Dr. F.K. Zadeck

Lex 3

Uploaded by

Copyright:

Available Formats

You might also like

Lex 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lex 3

Uploaded by

Copyright:

Available Formats

Automating Scanner Construction

RENFA (Thompson’s construction) 

The Big Picture

Two states are equivalent if and only if:

from Cooper & Torczon 2

Details of the algorithm

Initial partition, P0 , has two sets {F} & {Q-F} (D =(Q,,,q0,F))

from Cooper & Torczon 3

The algorithm Why does this work?

from Cooper & Torczon 4

Enough theory, does this stuff work?

Curren t Pa rt ition S p lit on a S p lit on b

P1 {s 4 }{s 3 }{s 0 , s 1 , s 2 } no ne {s 0 , s 2 }{s 1 }

P2 {s 4 }{s 3 }{s 1 }{s 0 , s 2 } no ne Non e

from Cooper & Torczon 5

First, the subset construction:

Then, apply the minimization algorithm b

b|c In lecture 6, I said that a human would design a

from Cooper & Torczon 7

Advantages of Regular Expressions

Example — an expression grammar

If REs are so useful …

from Cooper & Torczon 8

Not all languages are regular

You cannot construct DFA’s to recognize these languages

But, this is a little subtle. You can construct DFA’s for

from Cooper & Torczon 9

Poor language design can complicate scanning

• Significant blanks (Fortran & Algol68)

• String constants with special characters (C, others)

from Cooper & Torczon 10

from Cooper & Torczon 11

You might also like