Lexi Cal

Lexical Analysis
Find the FIRST and FOLLOW

• S -> ABCD | Є
• A -> a | Є
• B-> bA
• C -> a | Є
• D -> d
Introduction
• The task of analyzing the syntax is divided into two parts.
– Lexical – deals with small-scale language constructs, such as
names and numeric literals.
– Syntax – deals with large-scare constructs, such as expressions,
statements, and program units.
• Reason for separating it.

– Simplicity: Less complex.
– Efficiency: Allows optimization of lexical analyser
– Portability: machine dependent and independent
Lexical Analyzer
• A pattern matcher
• To find a substring of a given string of
characters that matches a given character
pattern.
Lexical Analyzer
• Token
• Pattern
• Lexeme
printf(“Total = %d\n”,score);
Id – printf & score (Lexemes)

Literal - “Total = %d\n” (Lexeme)
Tokens
• Keyword
• Operator
• Identifiers
• Constant - Numbers & Literals
• Punctuation Symbols – Left & Right
parenthesis, comma and semicolon.
Lexical Analyzer Responsibilities
• Lexical analyzer [Scanner]
– Scan input
– Remove white spaces
– Remove comments
– Manufacture tokens
– Generate lexical errors
– Pass token to parser
7
Two process
• Scanner – deletion of comments, compaction
of consecutive whitespace characters into
one.
• Lexical analysis – Producing tokens from the

output of the scanner.
Attributes for Tokens
E = M * C **2
<id, pointer to symbol table>

<assign_op>
<id, pointer to symbol table>
Lexical Errors
• Very hard for the Lexical analyzer to tell that
there is an error in the code without the aid of
the other components.
• fi(a==f(x))
• How do LA know that the if is written as fi or fi

is an undefined identifier.
Lexical Errors
• However, in some situation the LA is unable to
proceed because none of the patterns for
tokens matches any prefix of the remaining
input.
• Simplest recovery strategy is “panic mode”

recovery. (Delete the successive characters
until the LA can find a well –formed token).
Other Possible recovery actions
• Delete one character from the remaining input
• Insert a missing character into the remaining

input.
• Replace a character by another character.
• Transpose two adjacent characters.

Tricky problems in Token recognition
DO index variable = start, end, step

statements
END DO
Or equivalent
do 100 n=2,10,1
100 nfac=nfac*n
Tricky problems in Token recognition
• Assignment
DO 5 I = 1.25
do loop
• DO 5 I = 1,25
Input Buffering
• Two – Buffer Scheme
Input Buffering
• Examining ways of speeding reading the source program
– In one buffer technique, the last lexeme under process will be over-written when we
reload the buffer.
– Two-buffer scheme handling large look ahead safely
Buffer Pairs
• Two buffers of the same size, say 4096, are alternately reloaded.
• Two pointers to the input are maintained:
– Pointer lexeme_Begin marks the beginning of the current
lexeme.
– Pointer forward scans ahead until a pattern match is found.
Regular Expression
• Describing all the languages that can be built
from these operators applied to the symbols
of some alphabet.
letter(letter|digit)*
RE are built recursively out of smaller re, using

the rules
Specification of Patterns for Tokens:
Definitions
• An alphabet  is a finite set of symbols
(characters)
• A string s is a finite sequence of symbols from

– s denotes the length of string s
–  denotes the empty string, thus  = 0
• A language is a specific set of strings over
some fixed alphabet 
19
Specification of Patterns for Tokens: String
Operations
• The concatenation of two strings x and y is
denoted by xy
• The exponentation of a string s is defined by
s0 =  (Empty string: a string of length zero)

si = si-1s for i > 0
note that s = s = s
20
Recognition of Tokens
Transition Diagrams
• Patterns -> Stylished flow charts
• Lexeme Begin and forward.

Automaton
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
= 4
*
return(relop, LT)
5 return(relop, EQ)
>
=
6 7 return(relop, GE)
other
*
8 return(relop, GT)
23
Two More...
id :
letter or digit
start letter other *

9 10 11
delim :
delim
start delim other *

28 29 30
24
RE to Automata
Minimizing
• The DFA for a(b|c)*
Example #2: Applying Minimization
Example # 4
• Minimize the following DFA:
C
b a
b a
a b b start a b b
A B D E A B D E
a a
a
a b a
From Regular Expression to DFA Directly
• The “important states” of an NFA are those

without an -transition, that is if
move({s},a)   for some a then s is an
important state
• The subset construction algorithm uses only
the important states when it determines
-closure(move(T,a))
29
From Regular Expression to DFA Directly
(Algorithm)
• Augment the regular expression r with a
special end symbol # to make accepting states
important: the new expression is r#
• Construct a syntax tree for r#
• Traverse the tree to construct functions
nullable, firstpos, lastpos, and followpos
30
From Regular Expression to DFA Directly:
Syntax Tree of (a|b)*abb#
concatenation
#
6
b
closure 5
b
4
a
alternation
* 3
position
| number
(for leafs )
a b
31
1 2
Annotating the Tree
• nullable(n): the sub tree at node n generates languages
including the empty string
• firstpos(n): set of positions that can match the first

symbol of a string generated by the sub tree at node n
• lastpos(n): the set of positions that can match the last

symbol of a string generated be the sub tree at node n
• followpos(i): the set of positions that can follow position

i in the tree
32
Annotating the Tree
Node n nullable(n) firstpos(n) lastpos(n)
Leaf  true  
Leaf i false {i} {i}
| nullable(c1) firstpos(c1) lastpos(c1)

/ \ or  
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if nullable(c1) then if nullable(c2) then
• nullable(c1)
firstpos(c1)  lastpos(c1) 
/ \ and
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
else firstpos(c1) else lastpos(c2)
*
| true firstpos(c1) lastpos(c1)
c1 33
Syntax Tree of (a|b)*abb#
{1, 2, 3} {6}
{1, 2, 3} {5} {6} # {6}

6
{1, 2, 3} {4} {5} b {5}
nullable 5
{1, 2, 3} {3} {4} b {4}
4
firstpos lastpos
{1, 2} {1, 2} {3} a {3}
* 3
{1, 2} | {1, 2}
{1} a {1} {2} b {2} 34

1 2
followpos
for each node n in the tree do

if n is a cat-node with left child c1 and right child c2 then
for each i in lastpos(c1) do
followpos(i) := followpos(i)  firstpos(c2)
end do
else if n is a star-node
for each i in lastpos(n) do
followpos(i) := followpos(i)  firstpos(n)
end do
end if
end do
35
Algorithm
s0 := firstpos(root) where root is the root of the syntax tree
Dstates := {s0} and is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a   do
let U be the set of positions that are in followpos(p)
for some position p in T,
such that the symbol at position p is a
if U is not empty and not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
end do
36
Example
Node followpos
1 {1, 2, 3} 1
2 {1, 2, 3} 3 4 5 6
3 {4}
2
4 {5}
5 {6}
6 -
b b
a
start a 1,2, b 1,2, b 1,2,
1,2,3
3,4 3,5 3,6
a 37
a
Thank You

Lexi Cal

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lexi Cal

Uploaded by

Copyright:

Available Formats

Lexical Analysis

Find the FIRST and FOLLOW

• Reason for separating it.

Id – printf & score (Lexemes)

• Lexical analysis – Producing tokens from the

<id, pointer to symbol table>

• How do LA know that the if is written as fi or fi

• Simplest recovery strategy is “panic mode”

• Insert a missing character into the remaining

• Replace a character by another character.

• Transpose two adjacent characters.

DO index variable = start, end, step

RE are built recursively out of smaller re, using

s0 =  (Empty string: a string of length zero)

• Lexeme Begin and forward.

start letter other *

start delim other *

• The “important states” of an NFA are those

• firstpos(n): set of positions that can match the first

• lastpos(n): the set of positions that can match the last

• followpos(i): the set of positions that can follow position

Leaf i false {i} {i}

| nullable(c1) firstpos(c1) lastpos(c1)

{1, 2, 3} {5} {6} # {6}

{1} a {1} {2} b {2} 34

for each node n in the tree do

You might also like