Professional Documents
Culture Documents
Final Presentation (Minor)
Final Presentation (Minor)
Code optimizer
Code generator
Target program
06/19/09 Copyright Ankita 2
Source program Parser
• Convert a linear structure – sequence of
Lexical analyzer tokens – to a hierarchical tree-like structure –
an AST
Syntax analyzer
• The parser imposes the syntax rules of the
language
Semantic analyzer
• Work should be linear in the size of the
Intermediate input (else unusable) → type consistency
code generator cannot be checked in this phase
•Deterministic context free languages and
Code optimizer
pushdown automata for the basis
Code generator
Target program
06/19/09 Copyright Ankita 3
Source program Semantic Analysis
• Calculates the program’s “meaning”
Lexical analyzer
• Rules of the language are checked (variable
declaration, type checking)
Syntax analyzer
• Type checking also needed for code
Semantic analyzer generation (code gen for a + b depends on the
type of a and b)
Intermediate
code generator
Code optimizer
Code generator
Target program
06/19/09 Copyright Ankita 4
Source program Intermediate Code Generation
• Makes it easy to port compiler to other
Lexical analyzer architectures
• Can also be the basis for interpreters
Syntax analyzer
• Enables optimizations that are not machine
Semantic analyzer specific
Intermediate
code generator
Code optimizer
Code generator
Target program
06/19/09 Copyright Ankita 5
Source program Intermediate Code Optimization
• Constant propagation, dead code
Lexical analyzer elimination, common sub-expression
elimination, strength reduction, etc.
Syntax analyzer
• Based on dataflow analysis – properties that
are independent of execution paths
Semantic analyzer
Intermediate
code generator
Code optimizer
Code generator
Target program
06/19/09 Copyright Ankita 6
Source program Native Code Generation
• Intermediate code is translated into native
Lexical analyzer code
• Register allocation, instruction selection
Syntax analyzer
Semantic analyzer
Code generator
Target program
06/19/09 Copyright Ankita 7
Informal sketch of lexical analysis
Identifies tokens in input string
Specifying lexers
Regular expression
Improves portability
Non-standard symbols and alternate character
encodings can be normalized.
error error
Symbol Table
<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id,
“x”>
token
tokenval
(token attribute) Parser
In a programming language:
Identifier, Integer, Single-Float, Double-Float, operator
(perhaps single or multiple character), Comment,
Keyword, Whitespace, string constant, …
Recall:
Identifier: strings of letters or digits, starting with a
letter
Integer: a non-empty string of digits
Keyword: “else” or “if” or “begin” or …
Whitespace: a non-empty sequence of blanks, newlines,
and tabs
Not quite!
Optional
regular
NFA DFA
expressions
a S = {0,1,2,3}
Σ = {a,b}
start a b b s0 = 0
0 1 2 3
F = {3}
b
Input Input
State
δ(0,a) = {0,1} a b
δ(0,b) = {0} 0 {0, 1} {0}
δ(1,b) = {2} 1 {2}
δ(2,b) = {3} 2 {3}
b
b
a
start a b b
0 1 2 3
a a