UNIT1 - Lexical Analysis1

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 68

Unit I

Lexical Analysis

06/22/2009 Department of Computer Science 1


ER&DCIInstitute of Technology
Some Important
Some Important Basic
Basic Definitions
Definitions

lexical: of or relating to the morphemes of a language.

morpheme: a meaningful linguistic unit that cannot


be divided into smaller meaningful parts.

lexical analysis: the task concerned with breaking an


input into its smallest meaningful units, called tokens.

06/22/2009 Department of Computer ScienceER&DCI 2


Institute of Technology
The Role
The Role of
of aa Lexical
Lexical Analyzer
Analyzer

pass token
Source read char Lexical Parser
program analyzer
get next

Symbol Table

06/22/2009 Department of Computer ScienceER&DCI 3


Institute of Technology
Lexical Analyzer
Lexical Analyzer

 Functions
 Grouping input characters into
tokens
 Stripping out comments and white
spaces
 Correlating error messages with the
source program

06/22/2009 Department of Computer ScienceER&DCI 4


Institute of Technology
Why Separate?
Why Separate?

 Reasons to separate lexical analysis from


parsing:
 Simpler design
 Improved efficiency
 Portability

 Tools exist to help implement lexical


analyzers and parsers independently

06/22/2009 Department of Computer ScienceER&DCI 5


Institute of Technology
Typical Tokens
Typical Tokens in
in aa PL
PL
 Symbols
+, -, *, /, =, <, >, ->, …
 Keywords
if, while, struct, float, int, …
 Integer and Real (floating point) literals
123, 123.45
 Char (string) literals
 Identifiers
 Comments
 White space
06/22/2009 Department of Computer ScienceER&DCI 6
Institute of Technology
Introducing Basic
Introducing Basic Terminology
Terminology
 What are Major Terms for Lexical Analysis?
 TOKEN
 A classification for a common set of strings
 Examples Include <Identifier>, <number>, etc.
 PATTERN
 The rules which characterize the set of strings
for a token
 LEXEME
 Actual sequence of characters that matches
pattern and is classified by a token
 Identifiers: x, count, name, etc…

06/22/2009 Department of Computer ScienceER&DCI 7


Institute of Technology
Introducing Basic
Introducing Basic Terminology
Terminology
Token Sample Lexemes Informal Description of Pattern
const const const
if if if
relation <, <=, =, < >, >, >= < or <= or = or < > or >= or >
id pi, count, D2 letter followed by letters and digits
num 3.1416, 0, 6.02E23 any numeric constant
literal “core dumped” any characters between “ and “ except

Actual values are critical. Info is :


Classifies
Pattern 1. Stored in symbol table
2. Returned to parser

06/22/2009 Department of Computer ScienceER&DCI 8


Institute of Technology
Token Attribute
Token Attribute
 E = C1 ** 10
Token Attribute

ID Index to symbol table entry E

ID Index to symbol table entry


C1
**

NUM 10

06/22/2009 Department of Computer ScienceER&DCI 9


Institute of Technology
Case Study
Case Study
 When blanks are not significant (as
in Fortran and Algol68)
DO 5 I = 1.25 DO 5 I = 1, 25

DO5I is an ID This is a DO loop


So 7 tokens will get generated
DO 10 I =1, 100
STATEMENT
STATEMENT
STATEMENT
10 CONTINUE

06/22/2009 Department of Computer ScienceER&DCI 10


Institute of Technology
Case Study
Case Study (cont.)
(cont.)

 When key words are not reserved


words (such as in PL/1)
example 1:
IF THEN THEN THEN = ELSE ELSE
ELSE = THEN;
Which THEN is an identifier?
Which THEN is the key word?

06/22/2009 Department of Computer ScienceER&DCI 11


Institute of Technology
Handling Lexical
Handling Lexical Errors
Errors
 Error Handling is very localized, with Respect to
Input Source
 For example: fi( a == f(x) ) …
generates no lexical errors in C
 In what Situations do Errors Occur?
 Prefix of remaining input doesn’t match any
defined token
 Possible error recovery actions:
 Deleting or Inserting Input Characters
 Replacing or Transposing Characters
 Or, skip over to next separator to “ignore” problem

06/22/2009 Department of Computer ScienceER&DCI 12


Institute of Technology
Implimentation of
Implimentation of lexical
lexical
analyzer
analyzer
 Use lexical analyzer generator to
produce lexical analyzer from a regular
expression based specification
 Write lexical analyzer in a conventional
systems-programming language
 Write lexical analyzer in assembly
language

06/22/2009 Department of Computer ScienceER&DCI 13


Institute of Technology
I/O -- Key
I/O Key For
For Successful
Successful Lexical
Lexical Analysis
Analysis

 Character-at-a-time I/O
 Block / Buffered I/O

 Block/Buffered I/O
 Utilize Block of memory
 Stage data from source to buffer block at a time
 Maintain two blocks -
 Asynchronous I/O - for 1 block
 While Lexical Analysis on 2nd block

Block 1 Block 2

When done, ptr... Still Process token


issue I/O
in 2nd block
06/22/2009 Department of Computer ScienceER&DCI 14
Institute of Technology
Code to
Code to advance
advance forward
forward ptr
ptr

if forward at end of first half then begin Checking if forward


ptr is at the end of
reload second half ; 1st half

forward := forward + 1
end
else if forward at end of second half then begin
Checking if
reload first half ; forward ptr is at
the end of 2nd half
move forward to beginning of first half
end
else forward := forward + 1;

E = M * C * * 2 eof

Lexeme begining forward

06/22/2009 Department of Computer ScienceER&DCI 15


Institute of Technology
Algorithm: Buffered
Algorithm: Buffered I/O
I/O with
with Sentinels
Sentinels
Current token

E = M * eof C * * 2 eof eof


lexeme beginning forward (scans ahead to
forward : = forward + 1 ; find pattern match)
if forward is at eof then begin
if forward at end of first half then begin
reload second half ; Block I/O
forward : = forward + 1
end
else if forward at end of second half then begin
reload first half ; Block I/O
move forward to biginning of first half
end
else / * eof within buffer signifying end of input * /
terminate lexical analysis
2nd eof  no more input !
end
06/22/2009 Department of Computer ScienceER&DCI 16
Institute of Technology
Specification of
Specification of Patterns
Patterns for
for
Tokens: Terminology
Tokens: Terminology
 An alphabet  is a finite set of symbols
(characters / letters)

 A string s is a finite sequence of symbols


from 
 |s| denotes the length of string s
  denotes the empty string, thus || = 0

 A language is a specific set of strings


over some fixed alphabet 
06/22/2009 Department of Computer ScienceER&DCI 17
Institute of Technology
Examples
Examples
 Alphabet
 {0,1} is binary alphabet
 String
 s=010
 |s| =3
 Language
 {010,011,00,010……….} language over
alphabet {0,1}

06/22/2009 Department of Computer ScienceER&DCI 18


Institute of Technology
Language Concepts
Language Concepts
A language, L, is simply any set of strings over a fixed alphabet.
Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {TEE,FORE,BALL,…}
{FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…} { All grammatically correct
English sentences }
Special Languages:  - EMPTY LANGUAGE
{} - contains  string only
06/22/2009 Department of Computer ScienceER&DCI 19
Institute of Technology
Terms for
Terms for parts
parts of
of aa string
string

EXAMPLES AND OTHER CONCEPTS:


Suppose: S is the string banana

Prefix : ban, banana


Proper prefix, suffix,
Suffix : ana, banana
or substring cannot
Substring : nan, ban, ana, banana be all of S
Subsequence: bnan, nn

06/22/2009 Department of Computer ScienceER&DCI 20


Institute of Technology
Formal Language
Formal Language Operations
Operations

OPERATION DEFINITION
union of L and M L  M = {s | s is in L or s is in M}
written L  M
concatenation of L LM = {st | s is in L and t is in M}
and M written LM

Kleene closure of L L*= Li

written L*

i 0

L* denotes “zero or more concatenations of “ L


positive closure of 
L i

L written L+ L += 
i 1

L+ denotes “one or more concatenations of “ L

06/22/2009 Department of Computer ScienceER&DCI 21


Institute of Technology
Formal
Formal Language
Language Operations
Operations Examples
Examples

L = {A, B, C, D } D = {1, 2, 3}
L  D = {A, B, C, D, 1, 2, 3 }
LD = {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
L2 = { AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
L4 = L2 L2 = {set of all 4 letter strings}
L* = { All possible strings of L plus  }
L+ = L* - 
L (L  D )* = ??

06/22/2009 Department of Computer ScienceER&DCI 22


Institute of Technology
Language
Language &
& Regular
Regular Expressions
Expressions

 A Regular Expression is a Set of Rules /


Techniques for Constructing Sequences of Strings
from an Alphabet.
 Suitable for specifying the structure of tokens in
programming languages

 Let  be an Alphabet, r a Regular Expression


Then L(r) is the Language that is characterized
by the rules of r

06/22/2009 Department of Computer ScienceER&DCI 23


Institute of Technology
Rules for
Rules for Specifying
Specifying Regular
Regular Expressions:
Expressions:
Regular Expression over alphabet 
  is a regular expression denoting {}
• If a is in , a is a regular expression that denotes {a}

• Let r and s be regular expressions with languages L(r) and L(s).


Then
(a) (r) | (s) is a regular expression  L(r)  L(s)
(b) (r)(s) is a regular expression  L(r) L(s)
(c) (r)* is a regular expression  (L(r))*
(d) (r) is a regular expression  L(r)
All are Left-Associative. Parentheses are dropped as allowed by
precedence rules.
06/22/2009 Department of Computer ScienceER&DCI 24
Institute of Technology
Example
Example
 The identifier in Pascal can be defined as
letter (letter | digit) *

 More examples
 a | b denotes the set {a,b}
 (a|b) (a|b) denotes the set {aa, ab,ba,bb}
 a* denotes {, a, aa, aaa, …}
 (a|b)* denotes all strings of a’s and b’s
also equal to (a*b*)*

06/22/2009 Department of Computer ScienceER&DCI 25


Institute of Technology
EXAMPLES of
EXAMPLES of Regular
Regular Expressions
Expressions

L = {A, B, C, D } D = {1, 2, 3}

A|B|C|D =L
(A | B | C | D ) (A | B | C | D ) = L2
(A | B | C | D )* = L*
(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L  D)

06/22/2009 Department of Computer ScienceER&DCI 26


Institute of Technology
Towards Token
Towards Token Definition
Definition
Regular Definitions: Associate names with Regular Expressions
For Example : PASCAL IDs
letter  A | B | C | … | Z | a | b | … | z
digit  0 | 1 | 2 | … | 9
id  letter ( letter | digit )*
Shorthand Notation:
“+” : one or more r* = r+ |  & r+ = r r*
“?” : zero or one r?=r | 
[range] : set range of characters (replaces “|” )
[A-Z] = A | B | C | … | Z
Example Using Shorthand : PASCAL IDs
id  [A-Za-z][A-Za-z0-9]*

06/22/2009 Department of Computer ScienceER&DCI 27


Institute of Technology
Algebraic Properties
Algebraic Properties of
of
Regular Expressions
Regular Expressions

AXIOM DESCRIPTION
r|s=s|r | is commutative
r | (s | t) = (r | s) | t | is associative
(r s) t = r (s t) concatenation is associative
r(s|t)=rs|rt
(s|t)r=sr|tr concatenation distributes over |

r = r
r = r  Is the identity element for concatenation

r* = ( r |  )* relation between * and 


r** = r* * is idempotent

06/22/2009 Department of Computer ScienceER&DCI 28


Institute of Technology
Non-regular set
Non-regular set

 RE can denote a fixed number or


unspecified number of repetitions of a
given construct. Its best use is for
describing identifiers, constants, … etc.
 RE can not be used to describe balanced
or nested structures, such as nested
loops, nested if-then-else.

06/22/2009 Department of Computer ScienceER&DCI 29


Institute of Technology
Token Recognition
Token Recognition
How can we use concepts developed so far to assist in recognizing
tokens of a source language ?
Assume Following Tokens:
if, then, else, relop, id, num

Given Tokens, What are Patterns ?


if  if Grammar:
stmt  if expr then stmt
then  then |if expr then stmt else stmt
Regular else  else |
definitions expr  term relop term | term
relop  < | <= | > | >= | = | <>
term  id | num
id  letter ( letter | digit )*
num  digit + (. digit + ) ? ( E(+ | -) ? digit + ) ?

06/22/2009 Department of Computer ScienceER&DCI 30


Institute of Technology
What Else
What Else Does
Does Lexical
Lexical Analyzer
Analyzer Do?
Do?

Scan away b, nl, tabs


Can we Define Tokens For These?

blank  b
tab  ^T
newline  ^M
delim  blank | tab | newline
ws  delim +

06/22/2009 Department of Computer ScienceER&DCI 31


Institute of Technology
Overall
Overall
Regular Token Attribute-Value
Expression
ws - -
if if -
then then -
else else -
id id pointer to table entry
num num pointer to table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE
Note: Each token has a unique token identifier to define category of lexemes
06/22/2009 Department of Computer ScienceER&DCI 32
Institute of Technology
Constructing Transition
Constructing Transition Diagrams
Diagrams for
for Tokens
Tokens

• Transition Diagrams (TD) are used to represent the tokens


• As characters are read, the relevant TDs are used to attempt to match
lexeme to a pattern
• Each TD has:
• States : Represented by Circles
• Actions (Transitions) : Represented by Arrows between states
• Start State : Beginning of a pattern (Arrowhead)
• Final State(s) : End of pattern (Concentric Circles)

06/22/2009 Department of Computer ScienceER&DCI 33


Institute of Technology
Transition Diagram
Transition Diagram Symbols
Symbols

A state

The start state

An accepting state

a
A transition

06/22/2009 Department of Computer ScienceER&DCI 34


Institute of Technology
Example TDs
Example TDs

>=: start > = RTN(GE)


0 6 7

other
8 * RTN(G)

We’ve accepted “>” and have read other char that


must be unread.
06/22/2009 Department of Computer ScienceER&DCI 35
Institute of Technology
Example :: All
Example All RELOPs
RELOPs
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other

= 4 * return(relop, LT)

5 return(relop, EQ)
>

=
6 7 return(relop, GE)
other
8 * return(relop, GT)

06/22/2009 Department of Computer ScienceER&DCI 36


Institute of Technology
Example TDs
Example TDs :: id
id and
and delim
delim

id :
letter or digit

start letter other *


9 10 11

return( get_token(), install_id())

Either returns ptr or “0” if reserved


delim :
delim
start delim other *
28 29 30

06/22/2009 Department of Computer ScienceER&DCI 37


Institute of Technology
Example TDs
Example TDs :: Unsigned
Unsigned #s
#s
digit digit digit

start digit . digit E +|- digit other *


12 13 14 15 16 17 18 19

E digit

digit digit

start digit * . digit other *


20 21 22 23 24

return(num, install_num())
digit

start digit other *


25 26 27

06/22/2009 Department of Computer ScienceER&DCI 38


Institute of Technology
QUESTION ::
QUESTION

What would the transition


diagram (TD) for strings of letters
that contain the five vowels in
their strict lexicographical order,
look like ?

06/22/2009 Department of Computer ScienceER&DCI 39


Institute of Technology
Answer
Answer
cons  B | C | D | F | … | Z
string  cons* A cons* E cons* I cons* O cons* U cons*

cons cons cons cons cons cons


start A E I O U other

accept

Note: The error path is


error
taken if the character is other
than a cons or the vowel in
the lex order.
06/22/2009 Department of Computer ScienceER&DCI 40
Institute of Technology
What Else Does Lexical Analyzer Do?
All Keywords / Reserved words are matched as ids
• After the match, the symbol table or a special keyword table is
consulted
• Keyword table contains string versions of all keywords and
associated token values
if 15
then 16
begin 17
... ...
• When a match is found, the token is returned, along with its
symbolic value, i.e., “then”, 16
• If a match is not found, then it is assumed that an id has been
discovered
06/22/2009 Department of Computer ScienceER&DCI 41
Institute of Technology
Implementing Transition Diagrams
digit
*
digit other
25 26 27
advances
............. forward
case 25; c = nextchar();
if (isdigit(c)) state = 26;
else state = fail();
Case numbers
break;
correspond to transition
case 26; c = nextchar();
diagram states !
if (isdigit(c)) state = 26;
else state = 27;
break;
case 27; retract(1); lexical_value = install_num();
return ( NUM );
.............
looks at the region
06/22/2009
retracts lexeme_beginning
Department of Computer ScienceER&DCI ... forward 42
forward Institute of Technology
Implementing Transition
Implementing Transition Diagram
Diagram

 Mapping transition diagrams into C code


letter or digit

start letter other


9 10 11 return(id)

switch (state) {

case 9: c = nextchar();
if (isletter( c) ) state = 10; else state = failure();
break;
case 10: ….
case 11: retract(1); insert(id); return;

06/22/2009 Department of Computer ScienceER&DCI 43


Institute of Technology
RE and
RE and Finite
Finite Automata
Automata

 Regular Expressions => Specification

 Finite Automata = >Implementation

06/22/2009 Department of Computer ScienceER&DCI 44


Institute of Technology
Generative Versus
Generative Versus Recognition
Recognition
 Regular expressions give you a way to generate all
strings in language
 Automata give you a way to recognize if a specific
string is in language
 Philosophically very different
 Theoretically equivalent (for regular expressions
and automata)
 Standard approach
 Use regular expressions when define language
 Translated into automata for implementation

06/22/2009 Department of Computer ScienceER&DCI 45


Institute of Technology
Finite Automata
Finite Automata &
& Language
Language Theory
Theory
Finite Automata : A recognizer that takes an input
string & determines whether it’s a
valid sentence of the language

Yes
String(x) FA
No
A FA contains :-
• a set of states (s)
• set of i/p symbols (∑)
• start state(s0)
• set of final (accepting states)(F)
• set of transitions(∂)
06/22/2009 Department of Computer ScienceER&DCI 46
Institute of Technology
Finite Automata
Finite Automata 22 types
types

Non-Deterministic : Has more than one alternative action


for the same input symbol.

Deterministic : Has at most one action for a given


input symbol.

Both types are used to recognize regular expressions.

06/22/2009 Department of Computer ScienceER&DCI 47


Institute of Technology
NFA vs.
NFA vs. DFA
DFA
 DFA
 No  transitions
 At most one transition from each state for each
letter
a a
NOT
OK
OK
b a
 NFA – neither restriction

06/22/2009 Department of Computer ScienceER&DCI 48


Institute of Technology
NFAs &
NFAs & DFAs
DFAs
Non-Deterministic Finite Automata (NFAs) easily
represent regular expression, but are somewhat less
precise.

Deterministic Finite Automata (DFAs) require more


complexity to represent regular expressions, but offer
more precision(faster).

We’ll review both plus conversion algorithms, i.e.,


NFA  DFA and DFA  NFA

06/22/2009 Department of Computer ScienceER&DCI 49


Institute of Technology
Non-Deterministic Finite
Non-Deterministic Finite Automata
Automata

An NFA is a mathematical model that consists of :


• S, a set of states
• , the symbols of the input alphabet
• move, a transition function.
• move(state, symbol)  set of states
• move : S  {}  Pow(S)
• A state, s0  S, the start state
• F  S, a set of final or accepting states.

06/22/2009 Department of Computer ScienceER&DCI 50


Institute of Technology
Representing NFAs
Representing NFAs

Transition Diagrams : Number states (circles),


arcs, final states, …

Transition Tables: More suitable to


representation within a
computer
Adv. :- faster access to
transitions of a given state on a
given char

We’ll see examples of both !


06/22/2009 Department of Computer ScienceER&DCI 51
Institute of Technology
Example NFA
Example NFA (a/b) * abb

S = { 0, 1, 2, 3 } Transition Diagram
a
s0 = 0 start
0 a 1 b 2 b 3
F={3}
 = { a, b } b

Transition Table
input
a b
 (null) moves possible
s 
t 0 { 0, 1 } {0} i j
a 1 -- {2}
t Switch state but do not
e 2 -- {3} use any input symbol
06/22/2009 Department of Computer ScienceER&DCI 52
Institute of Technology
Acceptance of NFA

 An NFA accepts an input string s iff there


is some path in the transition diagram
from the start state to some final state
such that the edge labels along this path
spell out s

06/22/2009 Department of Computer ScienceER&DCI 53


Institute of Technology
How Does
How Does An
An NFA
NFA Work
Work ??
a
start a b b
0 1 2 3

b • Given an input string, we trace moves


• If no more input & in final state, ACCEPT
EXAMPLE: 2nd path -OR-
Input: ababb
move(0, a) = 0
1st path
move(0, a) = 1 move(0, b) = 0
move(1, b) = 2 move(0, a) = 1
move(2, a) = ? (undefined) move(1, b) = 2
move(2, b) = 3
REJECT ! ACCEPT !
06/22/2009 Department of Computer ScienceER&DCI 54
Institute of Technology
Question
Question

Draw a TD for an NFA accepting


aa*/bb*

06/22/2009 Department of Computer ScienceER&DCI 55


Institute of Technology
Transition Diagram

aa* | bb*

a
a
1 2

start
0

3 4
b
b

06/22/2009 Department of Computer ScienceER&DCI 56


Institute of Technology
Deterministic Finite
Deterministic Finite Automata
Automata
 A DFA is an NFA with the following restrictions:
  moves are not allowed
 For every state s S, there is one and only one
path from s for every input symbol a  .

Since transition tables don’t have any alternative options, DFAs are
easily simulated via an algorithm.

06/22/2009 Department of Computer ScienceER&DCI 57


Institute of Technology
Deterministic Finite Automata

Input: An input string x terminated by an end-of-file character eof. A


DFA D with start state s0 and set of accepting states F

Output: the answer “yes” if D accepts x; “no” otherwise

s  s0
c  nextchar;
while c  eof do
s  move(s,c);
c  nextchar;
end;
if s is in F then return “yes”
else return “no”
06/22/2009 Department of Computer ScienceER&DCI 58
Institute of Technology
Example - DFA Transition Table
input
b
a a b
start a b b s 0 1 0
0 1 2 3
t 1 1 2
a a
b a t 2 1 3
e 3 1 0

Recall the original NFA:

a
start a b b
0 1 2 3

b
06/22/2009 Department of Computer ScienceER&DCI 59
Institute of Technology
Conversion of
Conversion of NFA
NFA to
to DFA
DFA

 Why?

 DFA is difficult to construct directly


from RE’s
 NFA is difficult to represent in a
computer program and inefficient to
compute

06/22/2009 Department of Computer ScienceER&DCI 60


Institute of Technology
Conversion : NFA  DFA Algorithm

• Algorithm Constructs a Transition Table for DFA from NFA


• Each state in DFA corresponds to a SET of states of the NFA
• Why does this occur ?
•  moves
• non-determinism
Both require us to characterize multiple situations that occur
for accepting the same string.
(Recall : Same input can have multiple paths in NFA)
• Key Issue : Reconciling AMBIGUITY !

06/22/2009 Department of Computer ScienceER&DCI 61


Institute of Technology
From an
From an NFA
NFA to
to aa DFA
DFA

a set of NFA states  a DFA state


 Find the initial state of the DFA
 Find all the states in the DFA
 Construct the transition table
 Find the final states of the DFA

06/22/2009 Department of Computer ScienceER&DCI 62


Institute of Technology
Construction of
Construction of DFA
DFA from
from NFA
NFA
Algorithm :- subset construction
i/p :- NFA(N)
o/p :- DFA(D)

Method

Initial state of D :- set of states consisting of s0 , the initial state


of N , together with all states of N that can be
reached from s0 by means of ε transitions only

Accepting state of D :- set of states that contains atleast 1


accepting state of N

06/22/2009 Department of Computer ScienceER&DCI 63


Institute of Technology
Subset Construction
Subset Construction Algorithm
Algorithm
 While there is an unmarked state x = {s0,s1,s2,………….sn} of D do
begin
mark x
for each i/p symbol a do
begin
Let T be the set of states to which there is a transition on a
from state si in x

y = ε-closure of T

If y has not yet been added to the set of states of D then


make y an unmarked state of D
Add a transition on from x to y labeled a if not already present
end
end

06/22/2009 Department of Computer ScienceER&DCI 64


Institute of Technology
Computing the
Computing the -closure
-closure
push all states in T onto stack; computing the
-closure
initialize -closure(T) to T;
while stack is not empty do begin
pop t, the top element, off the stack;
for each state u with edge from t to u labeled  do
if u is not in -closure(T) do begin
add u to -closure(T) ;
push u onto stack
end
06/22/2009 Department of Computer ScienceER&DCI 65
end Institute of Technology
NFA to
NFA to DFA
DFA conversion
conversion -- example1
example1

a
start a b b
0 1 2 3
b

06/22/2009 Department of Computer ScienceER&DCI 66


Institute of Technology
NFA to
NFA to DFA
DFA conversion
conversion -- example1
example1

a
start a b b
0 1 2 3
b
(0,a) = {0,1} New states
a b
(0,b) = {0}
A = {0} A B A
({0,1}, a) = {0,1}
({0,1}, b) = {0,2} B = {0,1} B B C
({0,2}, a) = {0,1} C = {0,2}
D = {0,3} C B D
({0,2}, b) = {0,3}
D B A
06/22/2009 Department of Computer ScienceER&DCI 67
Institute of Technology
NFA to
NFA to DFA
DFA conversion
conversion (cont.)
(cont.)
a a

start a b b
A B C D
b
a
a b
b
A B A

B B C

C B D

D B A
06/22/2009 Department of Computer ScienceER&DCI 68
Institute of Technology

You might also like