Professional Documents
Culture Documents
Lexer.: Chapter Two: Lexical Analysis
Lexer.: Chapter Two: Lexical Analysis
2. Lexical Analysis
The primary function of a scanner is to transform a
character stream into a token stream.
A scanner is sometimes called a lexical analyzer, or
lexer.
Lexical analysis is the first phase of a compiler
The lexical analyzer breaks these syntaxes into a series
of tokens, by removing any whitespace or comments
in the source code.
If the lexical analyzer finds a token invalid, it generates
an error.
It reads character streams from the source code,
checks for legal tokens, and passes the data to the
syntax analyzer when it demands.
Cont…
Tokens
Lexical errors are the errors thrown by your lexer when unable to continue.
This means that there’s no way to recognize a lexeme as a valid token for your
lexer
Error-recovery actions are:
Delete one character from the remaining input.
Insert a missing character in to the remaining input.
Replace a character by another character.
2.2. Specifications of Tokens
To identify the tokens it introduce regular expression, a notation that can be
used to describe essentially all the tokens of programming language .
Secondly token recognizers, which are designed using transition diagrams
and finite automata.
Cont….
2.2.1. Alphabets, Strings and Languages
Alphabets
Any finite set of symbols ∑ = {0,1} is a set of binary alphabets, ∑ = {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal
alphabets, ∑ = {a-z, A-Z} is a set of English language alphabets.
Strings
Any finite sequence of alphabets is called a string.
Length of the string is the total number of occurrence of alphabets, e.g., the length of the string tutorials point is 14 and
is denoted by |tutorials point| = 14.
A string having no alphabets, i.e. a string of zero length is known as an empty string and is denoted by ε (epsilon).
Example: 01101, 111, 0001, 111 … are strings from the binary alphabet ∑ = { 0, 1 }.
String Cont….
Power of an alphabet:
– If ∑ is an alphabet, we can express the set of all strings of a certain length from that alphabet by using an exponential
notation. We define ∑k to be the set of strings of length k, each of whose symbols is in ∑.
– ∑0= {ϵ}, regardless of what alphabet ∑ is.
– If ∑ = {0,1}, then ∑1 = {0,1}, ∑2 = {00, 01, 10, 11}, ∑3={000, 001, 010, 011, 100, 101, 110,111}
The set of all strings over an alphabet ∑ is conventionally denoted ∑*.
Example: {0,1}* = {ϵ, 0, 1, 00, 01, 10, 11, 000, …}
∑* = ∑0 U∑1 U∑2 U∑3U …
The set of non empty strings from alphabet ∑ is denoted ∑+ (excluding the empty string from the set of strings)
∑+ = ∑1 U ∑2 U ∑3 U …
∑* = ∑+ U {ε}.
String Cont….
Operation on strings
– concatenation (product):
• Let x and y be strings. Then xy denotes the concatenation of x and y, that is, the string formed by making a copy of x
and followed it by a copy of y.
• Example: Let x= 01101 and y= 110, then xy= 01101110 (yx=11001101)
• Note: For any string w, εw = wε = w (i.e. ϵ is the identity for concatenation)
A language is considered as a finite set of strings over some finite set of alphabets.
Finite languages can be described by means of regular expressions.
2.3.2. Regular Expressions
.
Reading assignment
.
Error Handling in Compiler Design
4. Global correction:
Its aim is to make as few changes as possible while converting an
incorrect input string to a valid string.
Classification of Compile-time error
Classification of Errors:
Classification of Compile-time error
. Lexical phase errors: These errors are detected during the lexical analysis
1
•Unmatched string
•Missing operator
•Misspelled keywords
•Unbalanced parenthesis
Classification of Compile-time error
3. Semantic errors
These errors are detected during semantic analysis phase.
Typical semantic errors are:
Incompatible type of operands
Undeclared variables
Not matching of actual arguments with formal one
A Typical Lexical Analyzer Generator
Lex/Flex: Lex is a program designed to generate scanners, also
known as tokenizers, which recognize lexical patterns in text.
Lex is an acronym that stands for "lexical analyzer generator."
The End