1 UNIT 1 CDUnit1 - Compatibility Mode

Language Processing System
Unit 1
Compiler Design
Dr. Rekh Ram Janghel
1
© 20 14SIRTS Dr. R R JANGHEL CS VII Compiler Unit1 1 © 20 14SIRTS Dr. R R JANGHEL CS VII Compiler Unit1 2
1 2
Translator – it is a program that takes as input a program written in one Assembler – An assembler is a translator that converts an assembly
programming language ( Source Language ) and produces as output a language program into relocatable machine language program.
program in another language ( Object or Target Language )
Types of translator – Preprocessor, Compiler, interpreter and assembler.
Interpreter
 During interpretation, the HLL program remains in the source form or in
Preprocessor: (#) simplified intermediate-code form and the actions implied by program are
 Preprocessor converts a high-level language into another (or same) executed by Interpreter.
simplified high level language.  Do not generate a target code.
 It converts structured HLL to conventional HLL.  Actually, interpreter is not a translator but it is an executer like CPU.
 It is also responsible for expansion of macros.  Advantage – No overheads of program translation
 It combines source modules in different files ( skeletal source program) into a – Smaller than compiler.
single source program.  Disadvantage – analysis of program during interpretation which is inefficient
in loops
Compiler: – Slower than compiler
 Compiler is a translator that converts a high-level programming language into  Interpreter is suitable for debugging purposes.
a low-level programming language such as assembly language.  Any programming language that provides debugging facility implies that it
 Compilers are machine dependent. has interpreter . Java has both compiler and Interpreter.
3 4
1
Linker & Loader
 Compiler operates in phases, each of which transform source program from
one representation to another.
Linker – it allow us to make a single program by linking machine code  Phases of a typical compiler are –
of user program with the machine codes of library files. Library files 1. Lexical Analyzer 4. Intermediate-code Generator
contains relocatable machine codes of routines provided by the 2. Syntax Analyzer 5. Code Optimizer
system and it is available to any program that needs it. 3. Semantic Analyzer 6. Code Generator.
 Phases are organized into Front End and Back End.
 Front end include those phases which depends on source language and
Loader – it takes relocatable machine code, alters relocatable independent of target machine. Front end includes Lexical Analyzer, Syntax
addresses and place the altered instructions and data in memory at Analyzer, Semantic Analyzer, Intermediate-code Generator and some
portion of Code Optimizer. Creation of symbol-table and error-handling of
proper location. This code is called absolute machine code. these phases are also included in front-end.
 Back end include those phases which depends on target machine. It
Sometimes linker and loader are collectively termed as loader. includes some portion of Code Optimizer and Code Generator along with
necessary symbol-table and error-handling operations.
 Several phases of a compiler are generally grouped into a single unit called
Pass. Activities of phases within a pass are interleaved. There is one input
file and one output file out of each pass. A compiler can be structured as
single-pass or multiple-pass.
5 6
Compiler Phases: LEXICAL ANALYZER

Source program • Lexical Analyzer reads the source program character by character and
Front End returns the tokens of the source program.
Lexical analyzer
Token
Syntax analyzer Source Lexical Parser/ Syntax
Analyzer Analyzer
Program
Get next
Semantic analyzer Token
Intermediate code generator

• A token describes a pattern of characters having same meaning in the
Symbol-table source program. (such as identifiers, operators, keywords, numbers,
Code optimizer Error handler delimiters and so on)
manager
• Group of characters forming a token is called the Lexeme.
Code generator
• This is the initial part of reading and analyzing the program text: The text is
Backend read and divided into tokens, each of which corresponds to a symbol in the
Target program programming language,e.g., a variable name, keyword or number.
7 8
2
Lexical Analyzer
 The lexical analyzer takes input a stream of characters and gives as 
output a stream of tokens which parser uses for syntax analysis. • Whitespace: A sequence of space, tab, newline,carriage-return,
 Parser send “get next token” command to scanner which then sends form-feed characters etc.
the token. • Lexeme: A sequence of non-whitespace characters de-limited by
 Apart from tokenizing the input stream, some secondary tasks of whitespace or special characters (e.g. oper-ators like+,-,*).
lexical analyzer are:  The character sequence forming a token is called lexeme for the
 Removing comment, white space, tab and newline characters. token.
 Correlating error messages with the source program.  Typical entry of a symbol table includes – lexeme ptr, lexeme, token,
Source token attributes which includes type, dimension, value etc.
Token
Program Lexical
Parser
Analyzer
“get next token” Examples of lexemes.
• reserved words, keywords, identifiers etc.
Symbol • Each comment is usually a single lexeme
Table
• preprocessor directives
Interaction of Lexical Analyzer with Parser
© 20 14SIRTS Dr. R R JANGHEL CS VII Compiler Unit1 10
9 10
Lexical Analyzer
 : • Identification of tokens is usually done by a Deterministic
 Lexical Analyzer ( or Scanner ) takes as input a stream of characters from Finite-state automaton (DFA).
the source program and groups them logically into tokens.
• The set of tokens of a language is represented by a
• Token: A sequence of characters to be treated as a single unit.•
large regular expression.
• Examples of tokens.–
• This regular expression is fed to a lexical-analzer generator
Reserved words (e.g.begin,end,struct,if etc.)–
Keywords (integer,true etc.– such as Lex, Flex or ML-Lex.
Operators (+,&&,++etc)– • A giant DFA is created by the Lexical analzer generator.
Identifiers (variable names, procedure names, pa-rameter names)–
Literal constants (numeric, string, character con-stants etc.)–
Punctuation marks (:,, etc.)
SYMBOL TABLE
Lexeme Ptr Lexeme Token Attribute

123 position id1 real , 100.0
378 initial id2 real , 20.2
11 12
3
Lexical Analyzer Lexical Analyzer
Tokens, lexeme and Patterns Attributes for tokens
 Token is a name give to a logical group of characters. This name  A token influence the parsing decisions and attributes of token
reflects the category of group of characters. influence the translation of tokens.
 Lexeme is a string of characters.
 Typical attributes of a token are – type, value, dimension, length,
 Pattern is a rule describing the set of lexemes that can represent a line-number.
token. Patterns are specified by regular expressions and
implemented through programming its DFA.  Practically token has only one attribute – “pointer to symbol-table
entry”.
Token Lexeme Pattern  Token is generally written as – < token, ptr to ST >
relation <, <=, >, >=, <> < or <= or > or >= or <>  Ex E = M * V * * 2 will written as
id avg, count, pi, k1 Lettter followed by letters or digits <id1, 123><assign_op><id2, 125><mult_op><id3,130><exp_op><num, 2>
num 378, 3.02 Any numeric constant ( integer or real ) Pointer Lexeme Token Attributes
literal “God is great” Any characters between “ and “ except “ 123 E id1 Type = real Value = 3e10
if If if Symbol
125 M Id2 Type = real Value = 20
const const const Table
130 V Id3 Type = real Value = 3e8
13 14
Lexical Analyzer
 Streams of token from lexical analyzer is passed to next phase the Syntax Analyzer
Lexical Errors or Parser.
 In a statement like fi (x== g(y) )…a lexical analyzer cannot tell  This phase takes the list of tokens produced by the lexical analysis and arranges
these in a tree-structure (called the syntax tree) that reflects the structure of the
whether fi is misspelling of keyword if or an identifier. program. This phase is often called parsing
 If lexical analyzer is able to match a pattern for a lexeme then it  The Syntax Analyzer groups tokens into syntactic structures like expressions
generates the token otherwise it flags an error according to the grammar of the language.
 Syntactic structures are represented by Parse Tree whose leaves represent tokens
and interior nodes represent string of tokens (expression).
 Parse tree is further decomposed into Syntax tree – an internal representation of
syntactic structures.
 Ex. For a given grammar S id := E E  E+E | E*E | id | num
and token stream id1 := id2 + id3 * num , then parse tree and syntax tree will be
S
:=
id1 := E
id1
Parse tree Syntax tree +
E + E
id2 *
id2 E * E
id3 num
id3 num
15 16
4
 Most important task of Semantic Analyzer is Type-Checking – it checks  The program is translated to a simple machine-independent intermediate
that each operator has operands of the types permitted by source language language.
specification.
 If the source language permits type coercions, then semantic analyzer  The intermediate representation of the source program is generated by
converts operands types to suitable ones. traversing the syntax tree obtained from semantic analyzer.
 Other tasks of semantic analyzer are Disambiguate overloaded  It generates an intermediate representation of the source program for an
operators, Control flow checking, Name checks etc abstract machine.
 The intermediate code should have 2 properties – easy to produce and
 Ex. If integer and real is applied to * (multiplication), then semantic analyzer easy to translate into target program.
converts integer to real by using some internal operator inttoreal.  The intermediate code can be various types, but most common is Three-
address-code which is close to assembly language.
 Thus the modified syntax tree for id1 := id2 + id3 * num will be :  Three-address-code instructions contain at most 3 operands like “op1
operator op2 operator op3”
:=
 Three-address-code for id1 := id2 + id3 * inttoreal (60) will be:
id1
+
id2 * temp1 = inttoreal(num)
id3 inttoreal
temp2 = id3 * temp1
temp3 = id2 + temp2
num id1 = temp3
17 18
Compilation  Phase-to-Phase
position := initial + rate * 60
 The code optimizer optimizes the code produced by the intermediate code generator
in the terms of time and space.
 Code Optimizer attempts to improve intermediate code so that fast running Lexical Analyzer
id1 := id2 + id3 * num
machine code will result. It produces better/semantically equivalent code.
 Extensive optimizations slow down the compilation but speed up the execution
phase. Syntax Analyzer
 Optimized code of previous example: :=
temp1 = id3 * 60.0 id1
id1 = id2 + temp1 +
Semantic Analyzer :=
id2 * id1
+
 The code generator generates assembly code or relocatable machine code from id3 num
the optimized intermediate code.
Intermediate Code Generator id2 *
 The code generated depends on the machine and number of registers available. temp1 = nttoreal(num)
id3 inttoreal
 Assembly / machine code of the above optimized code will be : temp2 = id3 * temp1
MOVF id3,R2 temp3 = id2 + temp2
Code Optimizer num
MULF #60.0,R2 id1 = temp3

MOVF id3,R2
MOVF id2,R1
ADDF R2,R1 Code Generator MULF #60.0,R2
MOVF R1,id1 MOVF id2,R1
temp1 = id3 * 60.0
id1 = id2 + temp1 ADDF R2,R1
MOVF R1,id1
19 20
5
Single-Pass Vs Multi-Pass
• Several phases of a compiler are generally grouped into a single Single-Pass Compiler
unit called Pass. Activities of phases within a pass are interleaved. Memory
There is one input file and one output file out of each pass. A
compiler can be structured as single-pass or multiple-pass. Single
• Number of passes depends on the machine and the language for File Pass
Compiler
which compiler is designed.
• Certain languages allow declaration of a variable to occur after use
HLL prg Assembly prg
of that variable. Such languages requires atleast 2 passes.
• A multi-pass compiler requires less space in memory than single-
Multi-Pass Compiler
pass because space occupied by one pass can be reused by next
pass.
Internal
• A multi-pass compiler is slower than single-pass as it reads and Representation
Internal -2
Pass-3 Memory
writes an intermediate file during each pass. Pass-2 Representation -1
SMALL MEMORY  MULTI-PASS  SLOWER Pass-1

LARGE MEMORY  SINGLE-PASS  FASTER
HLL prg Assembly prg
21 22
Bootstrapping Bootstrapping…
• Bootstrapping is a process of writing a compiler for a computer language • Compilers are of two kinds: native and cross .
using the language itself.
– Native compilers are written in the same language as the target
• A compiler can be characterized by three languages: language. For example, LMM is a compiler for the language L
– the source language that it compiles (S). that is in a language that runs on machine M and generates
– the implementation language (I) that it is written in. output code that runs on machine M.
– the target language (T) that it generates code for. – Cross compilers are written in different language as the target
• These three language can be quite different. language. For example, LMN is a compiler for language L
• T-diagram is used to show a compiler with 3 languages: running on machine M and generates code for machine N.
• Suppose we want to write a cross-compiler LMN . For this we use a
S T LSN compiler written in language S. We compile LSN through its
I native compiler SMM on machine M to get LMN.
• In text representation it is written as SIT. L N L N

S S M M LSN + SMM = LMN
23 24
6
Bootstrapping… Bootstrapping…
Bootstrapping a compiler for a computer language L using the Bootstrapping a compiler to a second machine
language L itself on machine M : Let us assume that we have 2 machines M and N, and their assembly
• Suppose we a compiler for language L that runs on machine M and language is M and N respectively. We want to bootstrap a compiler
generate code for machine M. LLN to obtain a native compiler for L on N i.e LNN.
• First we write a small compiler SMM where S is the subset of L and
• First we obtain LMM -- a native compiler for L on machine M (as explained in
M is the assembly language of machine M. The compiler translates prev slide)
subset S of language L into machine language M.
• We compile LLN through LMM and get LMN which is a cross compiler.
• Second, we write a compiler for complete language L which is
written in simple language S and generates assembly code M for • Next we again compile LLN through LMN and get LNN
machine M, i.e. LSM
L N L N
• We compile LSM though SMM and we get LMM which is a native
compiler for language L on machine M. L N L L N N
L M L L M M
L M L M Bootstrapping a compiler
S S M M S S M M to a second machine
Bootstrapping a Compiler M M
25 26
Input Buffering Input Buffering with Sentinels

• Lexical analyzer needs to look ahead several input characters • Forward pointer performs two checks at each move. This can be
beyond the lexeme for a pattern so that it can announce the exact reduced to one test at each move with the help of sentinels.
token. For this input buffering is required. • We extend each half of the buffer to hold a sentinel character at the
• Input Buffering reduces the processing time and makes lexical end.
analyzer work faster.
E = M  eof V   2 eof eof
Buffer Pairs
• Many buffering schemes are possible. We outline a two-buffer or
buffer-pair input scheme. Sentinel Lexeme_beginning Forward Sentinel
• In this scheme, a buffer is divided into 2 halves of N-character each.
• N is the number of characters on one disk block like 1024 or 4096.
• Sentinels speed up input buffering as average number of checks per
• N input characters are read into each half of the buffer with one input character is one as compared to two checks per input
system read command. character without sentinels.
• If less than N input characters remain in the input then eof character
is read at the end. Thus eof marks end of source file.
27 28
7
Input Buffering Input Buffering
• Two pointers are used to read the buffer- lexeme_beginning and • If Lexeme_beginning pointer is in left half and Forward pointer
forward. moved across halfway mark, then right half is filled with N new input
• The string of characters between two pointers is the lexeme read so characters.
far. • If Lexeme_beginning pointer is in right half and Forward pointer
• Initially both pointers point to the first character of the next lexeme to moved across right end of the buffer, then left half is filled with N
be found. new input characters and Forward pointer wrap to the beginning of
the buffer.
Limitation :
E = M  V   2 eof • If Lexeme_beginning pointer is in left half and Forward pointer
moved across right end of the buffer, then left half cannot be filled
with N new input characters. In this case token cannot be
Lexeme_beginning Forward recognized.
• If Lexeme_beginning pointer is in right half and Forward pointer
• Forward pointer scans ahead until a match for a pattern is found.
moved across halfway mark by wrapping around left half, then right
• If a pattern is found the lexeme_beginnig pointer moves to the half cannot be filled with N new input characters. In this case token
beginning of next pattern to be found leaving all white spaces. cannot be recognized.
29 30
Input Buffering Recognition of Tokens

Length of the buffer limits the lookahead distance. For example in PL/I, • Tokens are recognized with the help of patterns which are regular
following can be written : expressions.
DECLARE ( ARG1, ARG2, ARG3,…,ARGn ) • These regular expressions are modeled as DFA.
• DFA is implemented through program code with small modules for
each state.
Here it cannot be determined whether DECLARE is a keyword or an
array name until we see the character that follows the right
Ex 1.1 consider following grammar fragment:
parentheses. So lookahead is proportional to the number of arguments.
stmt  if expr then stmt | if expr then stmt else stmt | ε
expr  term relop term | term
term  id | num
Where terminals generate following regular expressions:
if  if
then  then
else  else
relop  < | <= | <> | > | >=
31 32
8
SPECIFICATION OF TOKENS
There are 3 specifications of tokens:

• Strings
• Language
• Regular expression
Operations on languages:
• The following are the operations that can be applied to languages:
• 1 .Union
• 2.Concatenation
• 3 .Kleene closure
• 4.Positive closure
33 34
DFA to program Automation: An automaton is defined as a system where energy, materials

• NFA is more concise, but not as easy to RE and information are transformed, transmitted and used for performing
implement;
• In DFA, since transition tables don’t some functions without direct participation of man.
have any alternative options, DFAs are Thompson construction
easily simulated via an algorithm. Example: Automatic machine tools, automatic packing machines, and
NFA
• Every NFA can be converted to an automatic photo printing machine
equivalent DFA Subset construction
– What does equivalent mean?

• There are general algorithms that can DFA
take a DFA and produce a “minimal” Minimization
DFA. I1 O1
. .
– Minimal in what sense? Minimized DFA . Automation .
• There are programs that take a regular DFA simulation
Scanner INPUT . q 1…….qn OUTPUT
expression and produce a program generator
. .
.
based on a minimal DFA to recognize IP
strings defined by the RE. OP
• You can find out more in 451 Program
(automata theory) and/or 431
(Compiler design)
35 36
9
Finite Automata
DFA (Deterministic Finite Automata)
(FA)
• FA also called Finite State Machine (FSM)
Analytically, a finite automaton can be represented by a 5-tuple – Abstract model of a computing entity.
(Q,∑, δ, q0, F), – Decides whether to accept or reject a string.
where – Every regular expression can be represented as a FA and vice
1. Q is a finite nonempty set of states. versa
2. ∑ is a nonempty set of inputs called the input alphabet. • Two types of FAs:
3. δ is a function which maps Q x ∑ into Q and is usually called the – Non-deterministic (NFA): Has more than one alternative action
direct transition function. This is the function which describes the for the same input symbol.
change of state during the transition. This mapping is usually
represented by transition table or a transition diagram. – Deterministic (DFA): Has at most one action for a given input
symbol.
4. q0єQ is the initial state.
5. F is the sub set of Q and it is the set of final states. It is assumed here • Example: how do we write a program to recognize the Java
that there may be more than one final state. keyword “int”?
q0 i q1 n q2 t
q3
37 38
RE and Finite State Automaton (FA)

Inside scanner
• Regular expressions are a declarative way to describe the tokens
generator RE
– Describes what is a token, but not how to recognize the token Main components of scanner
• FAs are used to describe how the token is recognized generation (e.g., Lex) Thompson construction
– FAs are easy to simulate in a programs – Convert a regular expression to NFA
• There is a 1-1 correspondence between FAs & regular expressions a non-deterministic finite Subset construction
– A scanner generator (e.g., lex) bridges the gap between regular expressions automaton (NFA)
and FAs. DFA
– Convert the NFA to a Minimization
determinstic finite automaton
String stream Minimized DFA
(DFA)
DFA simulation
– Improve the DFA to minimize Scanner
Regular
Finite scanner generator
expression
automaton
program the number of states
– Generate a program in C or Program
Scanner generator some other language to
Tokens
“simulate” the DFA
39 40
10
Deterministic Finite Automaton (DFA) Nondeterministic Finite Automation
State/∑ a b
a 0,1 0
• A Deterministic Finite Automaton (DFA) is a special form of a NFA. →0
a b _ 2
• no state has - transition 0 1 2 1
start
• for each symbol a and state s, there is at most one labeled _ _
b 2
edge a leaving s.
i.e. transition function is from pair of state-symbol to state (not Transition graph of the NFA
set of states)
The language recognized by this NFA is (a|b)
a *ab
State/∑ a b
b a 0 is the start state s0
a b {2} is the set of final states F
0 1 2 1 0  = {a,b}
→0
S = {0,1,2}
1 1 2
b Transition Function
1 0
The language recognized by 2
this DFA is also (a|b) * a b
41 42
Q.) Construct a deterministic automation equivalent to

M=({q0,q1, q2}, { a, b}, δ, q0, {q2}),
1/0 Where δ is defined by its state table

0/1 1/0
q0 q1 q3 State/∑ a b
→q0 q0, q1 q2
1/0
q1 q0 q1
0/0 1011/0 q2 q0, q1
q2
1/0
Acceptability 101011 ,111010,10111 and 1011
43 44
11
Solution
For the deterministic automaton M1,
M =({q0,q1, q2},{ a, b}, δ,q0, {q2})
Q.) Construct a deterministic automation equivalent to
M1 = (2Q ,{ a, b}, δ, q0, F) M=({q0,q1, q2, q3},{ 0,1}, δ,q0, {q3}),
F={[q2], [q0, q2], [q1, q2],[q0, q1, q2]; Where δ is defined by its state table
State/∑ 0 1
State/∑ a b →q0 q0, q1 q0
q1 q2 q1
q0 q0, q1 q2 q2 q3 q3
q2 ø q0, q1
q3 q2
q0, q1 q0, q1 q1, q2
q1, q2 q0 q0, q1
45 46
Conversion from Regular grammar Transition diagram)

to Finite automata (Thomson’s Construction (cont.)
Solution: State/∑ 0 1 i

f
1 To recognize an empty string 
q0 q0, q1 q0
2 To recognize a symbol a in the alphabet  a
q0,q1 q0, q1, q2 q0, q1 i f
q0, q1, q2 q0, q1, q2, q3 q0,q1, q3
3 If N(r1) and N(r2) are NFAs for regular expressions r1 and r2
q0,q1, q3 q0, q1, q2 q0, q1, q2
• For regular expression r1 | r2
q0,q1, q2,q3 q0,q1, q2 ,q3 q0,q1 q2,q3
 N(r1) 
NFA for r1 | r2
i  f

N(r2)
47 48
12
Thomson’s Construction (cont.) Thomson’s Construction (Example - (a|b) * a )
a a 
• For regular expression r1 r2 a: 
(a | b) 
b 
b: b
i N(r1) N(r2) f
NFA for r1 r2 a 

Final state of N(r2) become final state of N(r1r2) * 
(a|b) 
 
b
• For regular expression r* 


  a
i N(r) f  

(a|b) * a   a
 
NFA for r* b
49 50
Simple examples of FA TRANSITION SYSTEM CONTAINING ^ -MOVES
start a Step 1 : Find all the edges starting from v2

a 0 1
Step 2 : Duplicate all these edges starting from V1'without
changing the edge labels.
a Step 3 : If v1 is an initial state, make V2 also as initial state.
a*
start
0 Step 4 : If v2is a final state. make V1 also as the final state.
a
a+
start a
0 1
(a|b)* a a,
start
0 start b
0
b
51 52
13
TRANSITION SYSTEM CONTAINING
^ -MOVES TRANSITION SYSTEM CONTAINING
^ -MOVES
53 54
55 56
14
Conversion from Regular grammar (Transition diagram) to
Finite automata Step 2: We eliminate the concatenations in the given r.e. by introducing new
vertices q, and q2
Q.) Construct the finite automaton equivalent to the regular expression
(0 + 1)*(00 + 11)(0 + 1)*
Answer:
Step 1: (Construction of transition graph) First of all we construct the
transition graph with Λ -moves .
Step 3 :
We eliminate the * operations in Figure 2 by introducing two new vertices q5
and q6 and the Λ -moves as shown in Figure 3.
57 58
Step 6:(Construction of DFA) We construct the transition table for the NDFA
Step 4: We eliminate concatenations and + in Fig. 3 and get Fig. 4.
defined by Table 2 State/∑ 0 1
→q0 q0, q3 q0, q4
q3 qf
q4 qf
qf qf qf
Step 5:
Step 7:Transition Table for the DFA of table 2.
We eliminate the Λ -moves in Fig. 4 and get Fig. 5 which gives the NDFA
equivalent to the given r.e. State/∑ 0 1
→q0 q 0, q3 q 0, q4
q 0,q 3 q 0 q3 q f q0 q 4
q0 q4 q 0 q3 q 0 q4 q f
q 0 q 4 qf q 0 q3 q f q 0 q4 q f
q 0 q3 qf q 0 q3 q f q 0 q4 q f
59 60
15
Step 8: The state diagram for the required DFA is given below Step 9:
61 62
Q .) Construct a DFA with reduced states equivalent to the r.e.

10 + (0 + 11)0*1.
63 64
16
Transition Table for NDFA
.
State/∑ 0 1
→q0, q3 q1, q2
q1 qf -
q2 - q3
Transition Table for DFAq q3 qf
3
qf - -
State/∑ 0 1
→q0, q3 q1, q2
q3 q3 qf
q1, q2 qf q3
qf Ø Ø
Ø Ø Ø
65 66
67
17

1 UNIT 1 CDUnit1 - Compatibility Mode

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 UNIT 1 CDUnit1 - Compatibility Mode

Uploaded by

Copyright:

Available Formats

Language Processing System

Dr. Rekh Ram Janghel

Compiler Phases: LEXICAL ANALYZER

Intermediate code generator

Lexeme Ptr Lexeme Token Attribute

© 20 14SIRTS Dr. R R JANGHEL CS VII Compiler Unit1 12

MULF #60.0,R2 id1 = temp3

SMALL MEMORY  MULTI-PASS  SLOWER Pass-1

• In text representation it is written as SIT. L N L N

Input Buffering Input Buffering with Sentinels

Input Buffering Recognition of Tokens

There are 3 specifications of tokens:

© 20 14SIRTS Dr. R R JANGHEL CS VII Compiler Unit1 34

DFA to program Automation: An automaton is defined as a system where energy, materials

– What does equivalent mean?

© 20 14SIRTS Dr. R R JANGHEL CS VII Compiler Unit1 35

© 20 14SIRTS Dr. R R JANGHEL CS VII Compiler Unit1 38

RE and Finite State Automaton (FA)

this DFA is also (a|b) * a b

Q.) Construct a deterministic automation equivalent to

1/0 Where δ is defined by its state table

Acceptability 101011 ,111010,10111 and 1011

Conversion from Regular grammar Transition diagram)

q0,q1, q2,q3 q0,q1, q2 ,q3 q0,q1 q2,q3

Simple examples of FA TRANSITION SYSTEM CONTAINING ^ -MOVES

start a Step 1 : Find all the edges starting from v2

Q .) Construct a DFA with reduced states equivalent to the r.e.

You might also like