Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 33

Software

Construction and
Development
Lecture-02
Phases of Compilation
Process
 Front-end and back-end have further components called
phases of compilation process.
 Front-end:
 Lexical Analysis
 Syntax Analysis
 Semantic Analysis
 Intermediate Code Generation
 Intermediate Code Optimization
 Back-end:
 Machine code Generation
 Machine Code Optimization
Inside a
Lexical Analyzer
Compiler
Syntax Analyzer Error
listing
Source code Semantic Analyzer

Intermediate code
Generator

Intermediate code
Optimizer

Intermediate code

Machine code
Generator

Machine code
Optimizer
Machine
code
How it works
How it works – contd…
Lexical Analysis (Scanning)

Source Stream
code
Lexical Analyzer of
tokens

may or may not

Lexical errors
Lexical Analyzer (Scanner)

 It reads the source code character by character.


 It tokenizes the source code.
 Characters combine to form tokens.
 Generates lexical errors.

program functions Statements tokens characters


& declarations
Tokens

 Tokens include keywords, identifiers,


constants, special characters, operators,
etc.
 Two tokens must be separated by a space
if they cannot be combined into one
token, For example, a20.
 Example:
a = b+10;
 x++;
 void main()
Lexical Errors

 Errors that are produced if the scanner does not find or


construct proper/valid tokens.
 Very rare in a program.
 For example:
 int a = 2..3;
 Int x = 30a;
Lexical Errors

 Invalid tokens include:


 Token with a white space.
 Beginning at one line and ending at the next. For example,
a = “ the real strength of a
nation lies in its people”;
 Having illegal character from the character set. For
example, int a = $10; or int b = 20d;
Syntax Analysis (Parsing)

Stream of
Source tokens Parse tree
code Lexical Analyzer Syntax Analyzer

may or may not

Syntax errors
Syntax Analyzer (Parser)

 It reads the stream of tokens


generated by the scanner.
 It contains the grammar for the
course code language.
 It applies the grammar rules to the
tokens and generates a parse tree.
 During the process, syntax errors
may be generated if the token does
not match with the grammar.
Syntax Errors

 The errors in the format or syntax of the statements


according to the grammar.
 For example, a missing token could cause a syntax
error.
 Examples:
 20 = a;
 a = 10
 a b = 10;
What is Grammar?

 Parser has a CFG (Context-Free


Grammar) for the source program.
 CFG contains all the rules that are
needed to build a language.
 For example, the grammar of English
language contains verbs, nouns,
articles, punctuation marks, etc. This
information helps us to build sentences
in English language and use it in daily
life.
What is Parse Tree?

 It is a tree-like structure that derives a particular


sentence of a language from its grammar (CFG).
 It depends on the input and the CFG.
 For example, the given CFG:
 S -> XY
 X -> a | ab
 Y -> b | aX
 For the input string aaab, the parse tree generated is:

X Y

a a X

a b
 Parser can only construct Parse tree if tokens are valid
and in right sequence.
 For example, input baab cannot be constructed from
the same CFG.
Semantic Analysis

Symbol table

Parse Annotated
tree
Semantic Analyzer parse tree

may or may not

Semantic errors
Semantic Analyzer

 It checks the context-sensitive issues or static semantics


of the source program.
 These semantics are those that can be checked at
compile time.
 For example, data types checking, scope checking,
declarations and definitions, etc.
What are Semantics?

 The meaning and interrelationships of


words, phrases, and sentences in a
language.
 Static Semantics:
 Canbe checked at compile-time.
 Example, a variable used, but not declared.

 Dynamic Semantics:
 Must be checked at run-time.
 Example, a char value being input to an int
variable at run-time.
 Compiler does not do dynamic semantic checking; it
generates code to check static semantics.
Symbol Table

 Semantic Analyzer uses a symbol table that contains all


the information regarding the identifiers.
 The information includes its type, declaration, scope,
etc.
 Symbol table can be generated by the Semantic
Analyzer itself, or at some early stages of compilation
process.
Semantic Errors
 Every identifier that is declared is stored in symbol table.
 If the source code contains an identifier that is missing in symbol table,
then it is a semantic error.
 Duplicate declarations, out of scope variables, invalid data type
comparison, missing or different parameters, etc.
Semantic Errors

 Example:
 void main() { a = 5; }
 void func1()

{ int x; }
void main()
{ int s = x; }
 void func1()

{ ……. }
void main()
{ func1(2); ……. }
Annotated Parse Tree

 A parse tree whose nodes have some annotations or


attributes attached to them.
 Annotations may include some additional information.
 For example, you might want to store the data type or
address of a variable in the parse tree.
Software Processes
 A Software Process is
 A set of activities (e.g. requirements,
analysis, design, coding, testing) combined
and sequenced in a particular fashion to
produce software

 Recent trend: Agile Software Development


 Customer needs evolve with time
 Satisfying customers at delivery time (rather
than at project initiation) is more important
than conforming to initial customer
requirements
Intermediate Code
Generation
 Annotated parse tree is flattened to generate
intermediate code.
 Intermediate code is in a generic form.
 Machine-independent.
 Intermediate code should be in a form that can be
easily converted into machine code.
Annotated Int. code Intermediate
parse tree Generator representation
Intermediate Code
Optimization
 Intermediate code is made more efficient by
applying optimization techniques.
 Efficient in terms of execution and memory
utilization.
 Optimization techniques are target machine-
independent.
 For example, there may be redundant
instructions or unreachable code in the
intermediate code; Optimizer removes it.
int a=10;
If (a<10)
{ ……………..} //this part of code will never be executed.
Example

 Intermediate code might have redundancy due to


translation process or due to different programming
styles of the programmers.
 For example, int a; a = 3*4;
 To make the above code efficient, the compiler will
check for the calculation at compile time and will
optimize it by generating code for a=12 instead of
a=3*4.
Example

 For optimization, compiler might use the values known


at compile time.
 For example, int a; a=10; a = a+2;
 Compiler will simply perform a=12 instead of the sub
expression. This is known as Constant Folding.
Machine Code Generation

 Intermediate code instructions are transformed into


machine code instructions.
 Transformation is one-to-one depending on the
intermediate code language (form).
 If the instruction set of target machine is known, it
becomes a simple process.
Machine Code Optimization

 Machine code is optimized by applying machine-specific


optimization techniques.

You might also like