Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

IMPLEMENTING

LEXICAL ANALYZER
USING FINITE
AUTOMATION
 We are given the following regular
definition:
if -> if
then -> then
else -> else
relop -> <| <=|=|<>|>|>=
id -> letter(letter|digit)*
num -> digit+(.digit+)? (E(+|-)?digit+)?
letter -> [a-z]|[A-Z]
digit ->[0-9]
 Recognize the keyword: if, then, else and
lexemes: relop, id, num
 delim -> blank|tab|newline
ws -> delim+
if a match for ws is found lexical analyzer
does not return a token to parser. It
proceeds to find a token following the white
space and return that to parser.
TRANSITION DIAGRAMS
 Transition diagram depicts the actions that
takes place when a lexical analyzer is called by
parser to get the next token
 TD keeps track of information about characters
that are seen as fwd pointer scans the input
 Position in TD are drawn as circles called states
 States are connected by arrows called edges
 Edges leaving state s have labels indicating i/p
characters that can next appear after
transition diagram have reached state s.
 Start state: state where control resides when
we begin to recognize a token.
 No valid transitions indicate failure
 Accepting state: state in which token can be
found.
 * indicates state in which retraction must
takes place
letter/digit

start letter
*
delimiter
0 1 2
 There may be several transition diagrams
 If failure occurs while following one transition
diagram, then retract the fwd pointer to where it
was in start state of this diagram and activate
next transition diagram
 If failure occurs in all transition diagrams, lexical
error will be detected and error recovery
routines will be invoked
 e.g. DO 5 I=1.25
DO 5 I=1,25
RECOGNITION OF RESERVED WORDS
 Initialize appropriately the symbol table in which
information about identifiers is stored
 Enter the reserved words into symbol table before
any characters in the i/p are seen.
 Make a note in the symbol table of the token to be
returned when the keyword is identified.
 Return statement next to accepting state uses
gettoken() and install_id() to obtain token and
attribute value
 When a lexeme is identified, symbol table is
checked
 if found as keyword install_id() will return 0
 If an identifier , pointer to symbol table entry will be
returned
 gettoken() will return the corresponding token
RECOGNITION OF NUMBERS
 When accepting state is reached,
 call a procedure install_num() that enters the
lexeme into table of numbers and returns a
pointer to created entry
 Returns the token NUM
IMPLEMENTING LEXICAL ANALYZER
 Token nexttoken( )
 {
 While (1)
 {
 switch(state) {
 case 0: c=nextchar();
 If (c==blank|| c==tab|| c==newline) {
 State =0;
 lexeme_beginning++;
 }
 else if (c==’<’) state=1;
 else if (c ==’=’)state=5;
 else if (c==’>’) state=6;
 else state=fail();
 break;
 case 1: c= nextchar();
 if (c==’=’) state=2;
 else if (c==’>’) state=3;
 else state=4;
 break;
 case 2: token.attribute=LE;
 token.name=relop;
 return token;
 case 8: retract (1);
 token.attribute=GT;
 token.name=relop;
 return token;
 case 9: c= nextchar();
 if (isletter(c)) state=10;
 else state= fail();
 break;
 case 10: c= nextchar();
 if (isletter(c)) state=10;
 else if (isdigit(c)) state=10;
 else state=11;
 break;
 case11: retract (1);
 entry=install_id( );
 name=gettoken();
 token.name= name;
 token. attribute=entry;
 return token;
 break;
 /* cases 12-24 here for numbers*/
 case 25: c= nextchar();
 if (isidgit(c)) state=26;
 else state=fail();
 break;
 case 26: c= nextchar();
 if (isidgit(c)) state=26;
 else state=27;
 break;
 case 27:retract (1); install_num( );
 return (NUM);
 }
 }
 }
CODE FOR NEXT STATE
 int state=0, start=0;
 int lexical_value;
 int fail()
 {
 forward=token_beginning;
 switch( start){
 case 0:start=9; break;
 case 9: start=12; break;
 case 12: start=20; break;
 case 20: start=25; break;
 case 25: recover( ); break;
 default: /* compiler error*/
 }
 return start;
 }

You might also like