Professional Documents
Culture Documents
Lexical Analysis: Scanners and Parsers
Lexical Analysis: Scanners and Parsers
Spring 2010
Spring 2010
Lexical Analysis
Implementing Scanners & LEX: A Lexical Analyzer Tool
Source code
Scanner
tokens
Parser
IR
Errors
Copyright 2010, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies of these materials for their personal use.
Pedro Diniz
pedro@isi.edu
Characters that form a word are its lexeme Its part of speech (or syntactic category) is called its token type Scanner discards white space & (often) comments
1
Pedro Diniz
pedro@isi.edu
Spring 2010
Spring 2010
Scanners Construction
Straightforward Implementation
Input: REs Construct an NFA for each RE Combine NFAs using -transitions (alternation in Thompsons construction) Create a DFA using the subset construction Minimize the resulting DFA Create an executable code for the DFA
Table-driven Scanners
[0..9]
S0
S1
[0..9]
S2
char nextChar() state s0 while (char eof) state (state,char) char nextChar() end while if state SF then report acceptance else report failure
s0 s1 s2 se r s1 se se se 0..9 se s2 s2 se Other se se se se
Pedro Diniz
pedro@isi.edu
Pedro Diniz
pedro@isi.edu
Spring 2010
Spring 2010
Direct-Coded Scanners
[0..9]
S0 goto s0 s0: char nextChar() if(char = r) then goto s 1 else goto se s1: char nextChar() if(0 char 9) then goto s 2 else goto se
Pedro Diniz
pedro@isi.edu
Scanners in Practice
Uses automated tools to construct a Lexical Analyzer
Given a set of tokens defined using regular expressions Tools will generate a character stream tokenizer by constructing a DFA
S1
[0..9]
S2
s2: char nextChar() if(0 char 9) then goto s 2 elseif (char = eof) then report acceptance else goto se se: report failure
Pedro Diniz
pedro@isi.edu
Spring 2010
Spring 2010
Output:
Generates a C function named yylex in the file lex.yy.c, with the command lex file.l and compiled using -ll switch flex uses the library -lfl
input buffer
yylex
yyout
8
Pedro Diniz
pedro@isi.edu
Spring 2010
output file
matching patterns
FA simulator
semantic actions
Spring 2010
Pedro Diniz
pedro@isi.edu
Spring 2010
Spring 2010
%%
{ws} if then else {id} {number} < <= = <> > >= {/* no action and no return */} { return(IF); } { return(THEN); } { return(ELSE); } { install_id(); return(ID); } { install_num(); return(NUM); } { return(LT); } { return(LE); } { return(EQ); } { return(NE); } { return(GT); } {return(GE); }
%%
int yywrap() { return 1; } int main(){ int n, tok; n = 0; while(1) { tok = yylex(); if(tok == 0) break; printf(" token matched is %d\n",tok); n++; } printf(" number of tokens matched is %d\n",n); return 0; }
Pedro Diniz
pedro@isi.edu
11
Pedro Diniz
pedro@isi.edu
12
Spring 2010
Spring 2010
Pedro Diniz
pedro@isi.edu
13
Pedro Diniz
pedro@isi.edu
14
Spring 2010
Spring 2010
Pedro Diniz
pedro@isi.edu
16
Spring 2010
Spring 2010
REJECT: after processing of the semantic action that includes a REJECT invocation, the processing restarts at the beginning of the matched text but this time ignoring it.
What is the point?!
Pedro Diniz
pedro@isi.edu
17
Pedro Diniz
pedro@isi.edu
18
Spring 2010
Spring 2010
more test1.txt <!-- sdsdswdsdsdsdssds --> <HREF="string4"> <!-<HREF="string5"> sdsdswdsdsdsdssds --> < HREF="string3"> <HREF="string2"> <HREF = "string1"> ./a.out < test1.txt begin comment end comment begin action ("string4") end action begin comment end comment begin action ("string3") end action begin action ("string2") end action begin action ("string1") end action 20
19
Pedro Diniz
pedro@isi.edu
Spring 2010
Summary
Scanner Construction
Table-driven Direct-coded
Pedro Diniz
pedro@isi.edu
21