Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 88

LEXICAL ANALYSIS

Dr. Murali Krishna Enduri


Department of CSE

1
Structure of Compiler

2
3
4
LEXICAL ANALYSIS
• Basic Concepts & Regular Expressions
• What does a Lexical Analyzer do?
• How does it Work?
• Formalizing Token Definition & Recognition
• Reviewing Finite Automata Concepts
• Non-Deterministic and Deterministic FA
• Conversion Process
• Regular Expressions to NFA
• NFA to DFA
• Relating NFAs/DFAs /Conversion to Lexical Analysis
5
Structure of Compiler

6
NEED AND ROLE OF LEXICAL ANALYZER
1. Lexical Analysis is the first phase of compiler. It reads the input characters from left to
right, one character at a time, from the source program.
2. It generates the sequence of tokens for each lexeme. Each token is a logical cohesive unit
such as identifiers, keywords, operators and punctuation marks.
3. It needs to enter that lexeme into the symbol table and also reads from the symbol table.

7
Why Separating Lexical and Syntactic?
There are several reasons for separating the analysis phase of compiling into lexical analysis
and parsing:
» It leads to simpler design of the parser as the unnecessary tokens can be eliminated by
scanner.
» Efficiency of the process of compilation is improved. The lexical analysis phase is most time
consuming phase in compilation. Using specialized buffering to improve the speed of
compilation.
» Portability of the compiler is enhanced as the specialized symbols and characters(language
and machine specific) are isolated during this phase

8
Lexical Analyzer in Perspective

• LEXICAL ANALYZER • PARSER

• Scan Input • Perform Syntax Analysis


• Remove WS, NL, … • Actions Dictated by Token Order
• Identify Tokens • Update Symbol Table Entries
• Create Symbol Table • Create Abstract Rep. of Source
• Insert Tokens into ST • Generate Errors
• Generate Errors • And More…. (We’ll see later)
• Send Tokens to Parser

9
NEED AND ROLE OF LEXICAL ANALYZER

 Remove comments and white spaces (aka scanning)


 Macros expansion
 Read input characters from the source program
 Group them into lexemes
 Produce as output a sequence of tokens
 Interact with the symbol table
 Correlate error messages generated by the compiler with the source program.

10
Basic Terminology

What are Major Terms for Lexical Analysis?


Token: a pair consisting of
–Token name: abstract symbol representing lexical unit [affects parsing
decision]
–Optional attribute value [influences translations after parsing]
Token name:Keywords, operators, identifiers, constants, literal strings, punctuation symbols(such as commas,semicolons)

Pattern: a description of the form that different lexemes take.


– For keywords, the pattern is just a sequence of characters that
form keywords.

Lexeme: sequence of characters in source program matching a pattern


Actual sequence of characters that matches pattern and is classified by a token
11
E.g.Relation{<.<=,>,>=,==,<>}
12
Basic Terminology

13
Basic Terminology

Token classes
• One token per keyword
• Tokens for the operators
• One token representing all
identifiers
• Tokens representing constants
(e.g. numbers)
• Tokens for punctuation
symbols

14
Example

15
Example of TOKENS

16
Example of TOKENS

17
18
19
Example of NON TOKENS

20
Tasks Lexical Analyzer

Separation of the input source code into tokens.

» Stripping out the unnecessary white spaces from


the source code.

» Removing the comments from the source text.

» Keeping track of line numbers while scanning the new line characters. These
line numbers are used by the error handler to print the error messages.

» Preprocessing of macros

21
Identify tokens and lexemes?

1.x=x*(acc+123)

22
• Alphabet: any finite set of symbols
• String over alphabet: finite sequence of symbols drawn from that alphabet
• Language: countable set of strings over some fixed alphabet

Regular Language:

23
Formal Definition of a Finite Automaton

24
25
Language operations

26
Concatenation

27
28
29
30
Nondeterministic Finite Automata:

31
Nondeterministic Finite Automata:
o Generalize FAs by adding nondeterminism,allowing several alternative computations
on the same input string.
o Ordinary deterministic FAs follow one path on each input.

32
Nondeterministic Finite Automata:

By continuing to experiment in this way, you will see that N1 accepts all
strings that contain either 101 or 11 as a substring.

33
Nondeterministic Finite Automata:

34
Formal Definition Nondeterministic Finite Automata:

35
Nondeterministic Finite Automata:

36
Nondeterministic Finite Automata:
NFA Example

37
Nondeterministic Finite Automata:
NFA Example

38
DFA VS NFA

39
40
41
REGULAR EXPRESSIONS

42
REGULAR EXPRESSIONS
• Regular expressions
• describe regular languages

• Example:
(a  b  c) *
• describes the language

 a, bc*  , a, bc, aa, abc, bca,...


43
REGULAR EXPRESSIONS
Recursive Definition

r
Given regular expressions 1 and r2
r1  r2
r1  r2
Are regular expressions

r1 *
 r1 
44
REGULAR EXPRESSIONS

A regular expression:
 a  b  c  * (c  )

Not a regular expression:


 a  b 

45
REGULAR EXPRESSIONS

L r : language of regular expression r

Example
L (a  b  c) *  , a, bc, aa, abc, bca,...
L    

L  

L a    a 46
REGULAR EXPRESSIONS

• For regular expressions r1 and r2

• L r1  r2   L r1   L r2 

L r1  r2   L r1  L r2 

L r1 *   L r1   *

L  r1    L r1 
47
REGULAR EXPRESSIONS
• Regular expression:  a  b  a *
L  a  b   a *  L  a  b   L a *
 L a  b  L a *
  L a   L b    L a   *
   a   b    a  *
  a, b , a, aa, aaa,...
  a, aa, aaa,..., b, ba, baa,...
48
REGULAR EXPRESSIONS

• Regular expression r   a  b  *  a  bb 
L r    a, bb, aa, abb, ba, bbb,...
r  (0  1) * 00 (0  1) *

L(r ) = { all strings containing substring 00 }

49
REGULAR EXPRESSIONS

50
REGULAR EXPRESSIONS

51
52
53
Mini QUIZ-9:
https://bit.ly/3hMYX30

54
55
REGULAR EXPRESSIONS to NFA

56
REGULAR EXPRESSIONS to NFA

57
REGULAR EXPRESSIONS to NFA

58
59
regular expressions
We need a formal way to specify patterns: regular expressions
• Alphabet: any finite set of symbols
• String over alphabet: finite sequence of symbols drawn from that alphabet
• Language: countable set of strings over some fixed alphabet

• Scanners are special pattern matching processors.


• For representing patterns of strings of characters, Regular Expressions(RE) are used.
• A regular expression (r) is defined by set of strings that matches it.
• This set is called as the language generated by the regular expression and is represented as
L(r).
• The set of symbols in the language is called the alphabet of the language is represented as Σ.

60
Star operation
Kleene closure

61
Different operations on languages
Operations on Languages

62
Formal Definition of Regular Expressions
Rules for specifying Regular Expressions

63
Regular Expressions Precedence
Regular Expressions

64
Regular Expressions Examples
Regular Expressions example

65
Regular Expressions Examples
Regular Expressions example

Which language is generated by:


• (a|b)(a|b)
• a*
• (a|b)*
• a|a*b

66
Regular Expressions example
Regular Expressions example

67
Regular Expressions example
Regular Expressions example

68
Regular Expressions example
Regular
Expressions
example

69
Tokens Specification
Regular Definition

70
Overall

71
Token Recognition

72
What Else Does Lexical Analyzer Do?

Scan away blanks, new lines, tabs


Can we Define Tokens For These?
blank  blank
tab  tab
newline  newline
delim  blank | tab | newline
ws  delim +
In these cases no token is returned to parser

73
Implementation: Transition Diagrams
• Intermediate step in constructing
lexical analyzer
• Convert patterns into flowcharts
called transition diagrams
– nodes or circles: called states
– Edges: directed from state to
another,labeled by symbols

74
Implementation: Transition Diagrams

Example TDs : id and delim

75
Implementation: Transition Diagrams

Example TDs : id and delim

76
Implementation: Transition Diagrams

Example TDs :
Unsigned #s

77
Implementation: Transition Diagrams

Example TDs :

78
Implementation: Transition Diagrams
What would the transition diagram (TD) for strings containing each vowel, in their strict lexicographical order,
look like?

79
Basics of DFA, RE, NFA

Relation between RE, NFA and DFA


1. There is an algorithm for converting any RE into an NFA.
2. There is an algorithm for converting any NFA to a DFA.
3. There is an algorithm minimizing the DFA.

80
Mini QUIZ-9:
https://bit.ly/3hMYX30

81
Some basic Terminologies

82
Handling Lexical Errors

1. Its hard for lexical analyzer without the aid of other components, that there is a source-code
error.
-- If the statement fi is encountered for the first time in a C program it can not tell whether fi is
misspelling of if statement or a undeclared literal.
-- Probably the parser in this case will be able to handle this.
In what Situations do Errors Occur?

Lexical analyzer is unable to proceed because none of the patterns for tokens matches a prefix of
remaining input.

Panic mode Recovery


• Delete successive characters from the remaining input until the analyzer can find a well-formed
token.
• May confuse the parser – creating syntax error
83
Handling Lexical Errors
Panic mode Recovery
• Delete successive characters from the remaining input until the analyzer can find a well-formed
token.
• May confuse the parser – creating syntax error

Panic mode recovery: delete successive characters from remaining input until token
found
• Insert missing character
• Delete a character
• Replace character by another
• Transpose two adjacent characters

Possible error recovery actions:


• Deleting or Inserting Input Characters
• Replacing or Transposing Characters
84
Handling Lexical Errors
Minimum distance error correction

• Is the strategy generally followed by the lexical analyzer to correct the errors in the
lexemes.
• It is nothing but the minimum number of the corrections to be made to convert an invalid
lexeme to a valid lexeme.
• But it is not generally used in practice because it is too costly to implement.

85
86
Example Programs

88

You might also like