Class 07 B

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

www.th-deg.

de

Lexical analysis II
1. From NFA to DFA
2. How lex works
3. From RE to DFA

TECHNISCHE HOCHSCHULE DEGGENDORF


2
www.th-deg.de

How lex works

lex
lex.l lex.yy.c
compiler

C
lex.yy.c a.out
compiler

input token
a.out
stream sequence

Source: Aho, Lam, Seti, Ullman; Compiler construction – principles, techniques, and tools

TECHNISCHE HOCHSCHULE DEGGENDORF


3
www.th-deg.de

Structure of lex program


declarations
%%
translation rules
%%
auxiliary functions

translation rules:
Pattern { Action }

Read input until the longest input prefix that matches an RE pattern
(several matching REs: use the first one); execute actions, until
one returns control to parser; lexer returns token name to parser;
global variable yylval stores additional information
TECHNISCHE HOCHSCHULE DEGGENDORF
4
www.th-deg.de

Example lex program


%{ // directly copied into lex.yy.c
double yylval;
typedef enum token_enum{NONE, NUMBER} token_t;
int yywrap(void){return 1;} // AT EOF: check yywrap: Iff false, set
// yyin to next input file; otherwise scanner terminates, returning 0
%}
NUMBER [0-9]+\.?|[0-9]*\.[0-9]+
%%
{NUMBER} {sscanf(yytext,"%lf",&yylval);return NUMBER;}
\n|. /*{return NONE;}*/
<<EOF>> {yyterminate();}
%%
int main(){ // "auxiliary" function(s)
token_t token = yylex();
while(token>0){
printf("read '%s' of length %d -- which is of type %d (value I %lf)\
n",yytext,yyleng,token,yylval);
token = yylex();
}
return 0;
}
TECHNISCHE HOCHSCHULE DEGGENDORF
5
www.th-deg.de

Lex
• Lookahead operator
– usually, only read 1 character at a time
– sometimes, it is necessary to postpone recognition until
AFTER the sequence to be matched is read
• Lookahead operator "/" separates pattern to be matched from
additional pattern
• Example:
In FORTRAN, keywords are not reserved
=>
IF(i,j)=3
is an array assignment – not a condition!
Lex rule:
IF / \( .* \) {letter}

TECHNISCHE HOCHSCHULE DEGGENDORF


6

You might also like