Professional Documents
Culture Documents
2 Lexicalanalysis
2 Lexicalanalysis
2 Lexicalanalysis
2. Lexical Analysis
Huang Shell Ying
Lexical Analysis back
Lexical Syntax
A stream of Analyzer A stream of Analyzer
characters tokens
lexical errors
For example, if (i == j) z = 0;
\t i f ( i = = j ) \t z = 0 ; …
back
Examples:
or by a transition table:
a b c others
1, start 2 0 0 0
2 0 3 0 0
State 0: error 3 0 0 4 0
4, final 2 0 4 0
20 Lexical Analysis CZ3007
Deterministic Finite Automata (DFA)
A DFA for identifiers: [a-zA-Z][a-zA-Z0-9]*
By a transition diagram
[a-zA-Z]
1 2
[a-zA-Z0-9]
By a transition table
a… z A… Z 0… 9 others
1, start 2… 2 2… 2 0… 0 0
2, final 2… 2 2… 2 2… 2 0
[0-9] [0-9]
By a transition table
[0-9] . others
1, start 2 0 0
2, final 2 3 0
3 4 0 0
4, final 4 0 0
22 Lexical Analysis CZ3007
Coding the Deterministic Finite Automata
A DFA can be coded in a table-driven form:
currentChar = read();
state = startState;
while true do
nextState = T[state, currentChar];
if (nextState == error) then break;
state = nextState;
currentChar = read()
if (state in acceptingStates)
then /* return or process the valid token */
else /* signal a lexical error */
back
23 Lexical Analysis CZ3007
Example
[a-zA-Z0-9]
[a-zA-Z]
1 2
Input: “max2”. What happens for the input “m”, for “2m”?
[0-9]
[0-9] . 5
1 3 4
[0-9]
[0-9]
Input: “0.5”. What happens for the input “6”, for “6.”?
λ [0-9]
1 ‘+’, ‘-’
11 2 3
[0-9]
An NFA for (a|b)*abb
a
a b b
1 2 3 4
b b 2
4
a
1
b
a
λ
3
1
a [ab]
a
3 4 The DFA D
An NFA N b
Start state: {1,2}
Start state: 1
27 Lexical Analysis CZ3007
Conversion of an NFA to a DFA
Put each state S of D on a work list when S is created.
For each state S = {n1, n2, …} on the work list and each
character c ∈ ∑, we compute the successor states of n1,
n2, … under c in N and obtain a set {m1, m2, …}. Then
we include the λ-successors of m1, m2, … in N.
The resulting set of NFA states is included as a state T in
D, and a transition from S to T, labeled with c is added to
D.
We continue to add states and transitions to D until all
possible successors to existing states are added.
An accepting (final) state of D is any set that contains an
accepting (final) state of N.
28 Lexical Analysis CZ3007
Conversion of an NFA to a DFA
a 5
b λ 2
1
a [ab]
a
3 4
b
b b
1,2 a 3,4,5 a
b A B b
4,5 D
a a
5 [ab] C [ab]
Regular expression
NFA
DFA
a λ
λ FA for A λ
λ FA for B λ
An NFA for AB
FA for A FA for B
An NFA for A*
λ
λ FA for A λ
λ
36 Lexical Analysis CZ3007
Translating Regular Expressions into NFAs
Example: [a-zA-Z] [a-zA-Z0-9]*
[a-zA-Z]
[a-zA-Z] :
λ
[a-zA-Z0-9]* : λ [a-zA-Z0-9] λ
Together:
[a-zA-Z] λ
λ [a-zA-Z0-9] λ
1 2 3 4 5
Slide 32
λ
After subset construction:
[a-zA-Z] [a-zA-Z0-9]
1 2,3,5 3,4,5 [a-zA-Z0-9]