Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 23

CPSC 388 Compiler Design

and Construction
Scanner Regular Expressions to DFA

Announcements

ACM Programming contest


(Tues 8pm)

PROG 1 Feedback
Linux Install Fest When?
Saturday?, Fliers, CDROMS, Bring
Laptops (do at own risk)
LUG
Understanding Editors (Eclipse, Vi,
Emacs)

Scanners

Source
Code

Lexical Analyzer
(Scanner)

Deterministic
Regular
Finite
Expression
State
Automata
Nondeterministic
Finite
State
Automata

Token
Stream

Regular Expressions

Easy way to express a language that is


accepted by FSA
Rules:
is a regular expression

Any symbol in is a regular expression


If r and s are any regular expressions then so is:

r|s denotes union e.g. r or s

rs denotes r followed by s (concatination)

(r)* denotes concatination of r with itself zero or


more times (Kleene closer)

() used for controlling order of operations

RE to NFA: Step 1

Create a tree from the Regular


Expression
*
Example
Cat

(a(a|b))*
Leaf Nodes are either
members of
or
Internal Nodes are operators
cat, |, *

RE to NFA: Step 2

Do a Post-Order Traversal of Tree


(children processed before parent)
At each node follow rules for
conversion from a RE to a NFA

Leaf Nodes

Either or member of

Cat
a

Internal Nodes

Need to keep track of left (l)and right


(r) NFA and merge them into a single
NFA

Or
Concatination
Kleene Closure

Or Node
l

Concatenation Node
l

Kleene Closure

Try It

Convert the regular expression to a


NFA
(a|b)*abb

First convert RE to a tree


Then convert tree to NFA

NFA to DFA

Recall that a DFA can be represented


as a transition table
Characters
+
Digit

State

S
A
B

B
B
B

Operations on NFA

-closure(t) Set of NFA states


reachable from NFA state t on transitions alone.
-closure(T) Set of NFA states
reachable from some NFA state t in
set T on -transitions alone.
move(T,a) Set of NFA states to
which there is a transition on input
symbol a from some state t in T

NFA to DFA Algorithm


Initially -closure(s) is the only state
in DFA and it is unmarked
While (there is unmarked state T in DFA)
mark T;
for (each input symbol a) {
U = -closure(move(T,a));

if (U not in DFA)
add U unmarked to DFA
transition[T,a]=U;

Try it
Take NFA from previous example and
construct DFA

Regular Expression: (a|b)*abb

Corresponding DFA
b
C
1,2,4,
5,6,7
b
NewS
S,1,2,4,7

a
B
1,2,3,4
6,7,8

D
1,2,4,5,
6,7,9

b
a

NewF
1,2,4,5,
6,7,F

Start State and Accepting States

The Start State for the DFA is


-closure(s)

The accepting states in the DFA are


those states that contain an accepting
state from the NFA

Efficiency of Algorithms

RE -> NFA
O(|r|) where |r| is the size of the RE

NFA -> DFA


O(|r|22|r|) worst case

(not seen in typical programming languages)

Recognition of a string by DFA


O(|x|) where |x| is length of string

More Practice

Convert RE to NFA
((|a)b*)*

Convert NFA to DFA


a

2
b

Solution to Practice
RE to NFA

Solution to Practice

NFA to DFA

a
A
2

a
NewS
S,1,3
b

B
4

Summary of Scanners

Lexemes
Tokens
Regular Expressions, Extended RE
Regular Definitions
Finite Automata (DFA & NFA)
Conversion from RE->NFA->DFA
JLex

You might also like