Professional Documents
Culture Documents
J 3lecxical - Analysis
J 3lecxical - Analysis
J 3lecxical - Analysis
CS F 363/ IS F342
D.C.KIRAN
dck@pilani.bits-pilani.ac.in
BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus
Lexical Analysis
Scanning Perspective
TK_if → if
TK_then → then
TK_else → else
TK_relop → < | <= | > | >= | = | <>
TK_id → letter ( letter | digit )*
TK_num → digit + (. digit + ) * ( E(+ | -) * digit + ) *
TK_blank → b
TK_tab → ^T
TK_newline → ^M
TK_delim → blank | tab | newline
TK_ws → delim +
7
BITS Pilani, Pilani Campus
Lexical Analyzer Responsibilities
Input stream i f d 2 =…
Lexeme_beginning
FA simulator
(IF, if)
i f delim
= d
12
BITS Pilani, Pilani Campus
Identifier vs Keywords
elsex=0; else x = 0
elsex = 0
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
Pushback(LA)
= 4
*
return(relop, LT)
5 return(relop, EQ)
>
=
6 7 return(relop, GE)
other
8
*
return(relop, GT)
Pushback(LA)
ε ..
.
Ar_n
For faster scanning, convert this NFA to a DFA and minimize the states
Step 1:
< >
Return (TK_relOp, NE)
< =
ϵ Return (TK_relOp, LE)
Step 2:
< >
ϵ Return (TK_relOp, NE)
lexemes
/*Global Variables*/
/*Global Variables*/
int charClass, lexLen, LETTER=0, DIGIT=1, UNKNOWN=-1;
char lexeme [100], nextChar;
/*addChar*/
void addChar() { if(lexLen<=99) lexeme[lexlen++]=nextChar;
else printf(“error-lexeme too long\n”);
/*getChar*/
void getChar() {
if(isalpha(nextChar)) {charClass= LETTER;
else if (isdigit(nextChar)) charClass=DIGIT;
}
/*getNonblank*/
void getNonblank() { while(isspace(nextChar)) getChar();}
if(keyTable[key].flag==1)
{
if(strcmp(keyTable[key].lex,str))
{
strcpy(tok.lexeme,str);
/*Copying to Symbol table*/
strcpy(tok.token,keyTable[key].token);
tok.sym=keyTable[key].sym;
tok.lineno=lineno;
state=100; return 1;
}
}
return 0; /*Not Keyword*/
}
BITS Pilani, Pilani Campus
Simple-Lexical Analyzer
int lex() {
getChar();
switch (charClass) {
case LETTER:
addChar();
getChar();
while (charClass == LETTER || charClass == DIGIT)
{
addChar();
getChar();
}
return lookup(lexeme);
break;
…
…
case DIGIT:
addChar();
getChar();
while (charClass == DIGIT) {
addChar();
getChar();
}
return INT_LIT;
break;
} /* End of switch */
} /* End of function lex */
Character-at-a-time I/O
Block / Buffered I/O Tradeoff
Block/Buffered I/O
Utilize Block of memory
Stage data from source to buffer block at a time
Maintain two blocks - Why ?
Block 1 Block 2
Buffer
ma i n { i n
0 n-1
forward
Buffer
Source program
main
{ i n t a a = 5
int aa =55,bag=10; 0 n-1
…………….
…………… forward
……….. Buffer
}
5 5 , b a g = 1
n-1
forward
BITS Pilani, Pilani Campus
Buffered I/O
Buffer 1 Buffer 2
………m a i n { i n t aa=55, ba
0 n-1 n
2n-1
forward
forward : = forward + 1 ;
if forward ! = eof then begin
if forward at end of first block then begin
Source program
reload second block ;
main
forward : = forward + 1
{
Int aa=55,bag=10; end
……………. else if forward at end of second block then begin
…………… reload first block ;
……….. move forward to beginning of first block
} end
else / * eof within buffer signifying end of input * /
terminate lexical analysis
end
• Significant blanks
– In Fortran blanks are not significant
do 10 i = 1,25 do loop
do 10 i = 1.25 assignment to variable named do10i
How does a compiler do this?
• First pass finds & inserts blanks
• Second pass is normal scanner
30
BITS Pilani, Pilani Campus
string fun1(int x, string y,){
1) <Program> →<funcons>comma<functionbody>comma int a;
2) <funcons>→<funcon><funcons>/ ε string b;
3) <funcon> →<funsignature><functionbody> b = b + y;
4) <Funsignature>→ <type> id ( <params>) *y;
}
5) <type>→ int / float/string
6) <params>→ <type> id , <params> / ε float fun2(int I, float j,){
7) <functionbody>→{ <declaraons> <statements> *<E>; } int k;
8) <declaraons>→ <type>id ; <declarations >/ ε k = I *j+k;
9) <statements> → id := <E>;/ id := id <more>;<statements>/ε *k;
10) <E>→ <E>+<E >/ }
<E*E>/id/<intigerliteral>/<floatliteral>/<stringliteral>
11) <more> →(<args>) ,
{
12) <args> → id comma <args> / ε
float z; string s= “bits”;
int p;
Z=10.5; p=5;
z = fun1(p,s,);
p = fun2(p,z,);
*z;
}
,