Compilers PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Compiler

design
Overview and History
• Cause
– Software for early computers was written in assembly language
– The benefits of reusing software on different CPUs started to
become significantly greater than the cost of writing a compiler

• The first real compiler


– FORTRAN compilers of the late 1950s
– 18 person-years to build

2
Overview and History
• Compiler technology
– is more broadly applicable and has been employed in
rather unexpected areas.
• Text-formatting languages,
like nroff and troff; preprocessor packages like eqn,
tbl, pic
• Silicon compiler for the creation of VLSI circuits
• Command languages of OS
• Query languages of Database systems

3
What Do Compilers Do
• A compiler acts as a translator,
transforming human-oriented programming
languages
into computer-oriented machine languages.
– Ignore machine-dependent details for programmer

Programming Machine
Language Compiler Language
(Source) (Target)

4
What Do Compilers Do
• Compilers may generate three types of code:
– Pure Machine Code
• Machine instruction set without assuming the existence
of any operating system or library.
• Mostly being OS or embedded applications.
– Augmented Machine Code
• Code with OS routines and runtime support routines.
• More often
– Virtual Machine Code
• Virtual instructions, can be run on any architecture with
a virtual machine interpreter or a just-in-time compiler
• Ex. Java 5
The Structure of a Compiler
• Any compiler must perform two major tasks
Compiler

Analysis Synthesis

– Analysis of the source program


– Synthesis of a machine-language program
6
Compiler Architecture

In more detail:
Intermediate
Language
Source Target Language
Front End – Back End –
Language
language specific machine specific

•Separation of Concerns
•Retargeting

7
Compiler Architecture
Intermediate
Intermediate
Language
Language

Scanner Parser Semantic Code


Source Code Target
(lexical (syntax Analysis Generator
language Optimizer language
analysis) analysis) (IC generator)
tokens Syntactic
structure

Symbol
Table

8
Lexical Analysis - Scanning

Scanner tokens Parser Semantic Code


Source (lexical (syntax Generator
Analysis
languag analysis) analysis) (IC generator)
e

Code
Optimizer

• Tokens described formally


• Breaks input into tokens
Symbol
• Remove white space
Table

9
Input: result = a + b * c / d

• Tokens:
‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘c’, ‘/’, ‘d’

identifiers
operators

10
Static Analysis - Parsing

tokens Syntactic
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)

Code
Optimizer

• Syntax described formally


• Tokens organized into syntax tree
that describes structure Symbol
• Error checking (syntax) Table

11
Input: result = a + b * c / d

Exp ::= Exp ‘+’ Exp Assign


| Exp ‘-’ Exp
| Exp ‘*’ Exp ID ‘=‘ Exp
| Exp ‘/’ Exp
| ID Exp ‘+’ Exp

Assign ::= ID ‘=‘ Exp ID Exp ‘*’ Exp

ID Exp ‘/’ Exp

ID ID

12
Semantic Analysis

Syntactic/semantic
Syntactic structure
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)

Syntactic/semantic
structure
Code
Optimizer

• “Meaning”
• Type/Error Checking
• Intermediate Code Generation –
abstract machine Symbol
Table

13
Optimization

Scanner Parser Semantic Code


Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)
Syntactic/semantic
structure
Syntactic/semantic
Code structure
Optimizer

• Improving efficiency (machine


independent)
• Finding optimal code Symbol
Table

14
Code Generation

Syntactic/semantic
structure
Scanner Parser Semantic Code
Source Target
language
(lexical (syntax Analysis Generator language
analysis) analysis) (IC generator)

Syntactic/semantic
Code structure
Optimizer

• Intermediate code to real


machine code
• Memory management, register Symbol
allocation, instruction selection, Table
instruction scheduling, …

15
The Role of Lexical Analyzer
• Lexical analyzer is the first phase of a compiler.
• Its main task is to read input characters and produce as
output a sequence of tokens that parser uses for syntax
analysis.

16
Tokens, Patterns, Lexemes
• A lexeme is a sequence of characters in the source
program that is matched by the pattern for a token.
• A lexeme is a basic lexical unit of a language comprising
one or several words, the elements of which do not
separately convey the meaning of the whole.
• The lexemes of a programming language include its identifier,
literals, operators, and special words.
• A token of a language is a category of its lexemes.
• A pattern is a rule describing the set of lexemes that can
represent as particular token in source program.
17
Examples of Tokens
const pi = 3.1416;
The substring pi is a lexeme for the token “identifier.”

18
Lexeme and Token
Index = 2 * count +17;

Lexemes Tokens
Index Identifier
= equal_sign
2 int_literal
* multi_op
Count identifier
+ plus_op
17 int_literal
; semicolon
19
Lexical Errors
• Few errors are discernible at the lexical level
alone, because a lexical analyzer has a very
localized view of a source program.
• Let some other phase of compiler handle any
error.
• Panic mode
• Error recovery

20

You might also like