Csc3205-Semantic Analysis and Intermediate RepresentationFile

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Semantic Analysis & Intermediate Representation

CSC 3205: Compiler Design

Marriette Katarahweire

CSC 3205: Compiler Design 1/17


Semantic Analysis

CSC 3205: Compiler Design 2/17


Semantic Analysis

Semantic Analysis computes additional information related to


the meaning of the program once the syntactic structure is
known, i.e., try to understand the “meaning” of the program.
Try to verify a program “make sense.”
After parsing (and constructing AST) next is semantics
analysis & intermediate code generation
In typed languages as C, semantic analysis involves adding
information to the symbol table and performing type checking.

CSC 3205: Compiler Design 3/17


What does Semantic Analysis involve?

Semantic analysis typically involves:


Type checking – Data types are used in a manner that is
consistent with their definition (i. e., only with compatible
data types, only with operations that are defined for them,
etc.)
Label Checking – Labels references in a program must exist.
Flow control checks – control structures must be used in their
proper fashion (no GOTOs into a FORTRAN DO statement,
no breaks outside a loop or switch statement, etc.)

CSC 3205: Compiler Design 4/17


Semantic Analysis

Compiler must do more than recognize whether a sentence


belongs to the language. . .
Find remaining errors that would make program invalid
- undefined variables, types
- type errors that can be caught statically
Figure out useful information for later phases
- types of all expressions
- data layout

CSC 3205: Compiler Design 5/17


Kind of Checks

Uniqueness checks
- Certain names must be unique
- Many languages require variable declarations
Flow-of-control checks
- Match control-flow operators with structures
- Example: break applies to innermost loop/switch
Type checks
- Check compatibility of operators and operands
Logical checks
- Program is syntactically and semantically correct, but
does not do the “correct” thing

CSC 3205: Compiler Design 6/17


Examples of Reported Errors

Undeclared identifier
Multiply declared identifier
Index out of bounds
Wrong number or types of args to call
Incompatible types for operation
Break statement outside switch/loop
Goto with no label

CSC 3205: Compiler Design 7/17


Examples of Reported Errors

How do these checks help compilers?


Allocate right amount of space for variables
Select right machine operations
Proper implementation of control structures
Try compiling this code:
void main(){
int i=21, j=42;
printf(\Hello World\n");
printf(\Hello World, N=%d\n");
printf(\Hello World\n", i, j);
printf(\Hello World, I=%d, J=%d\n", i, j);
}

CSC 3205: Compiler Design 8/17


Typical Semantics Errors

Multiple declarations: a variable should be declared (in the


same scope) at most once
Undeclared variable: a variable should not be used before
being declared
Type mismatch: type of the LHS of an assignment should
match the type of the RHS
Wrong arguments: methods should be called with the right
number and types of arguments

CSC 3205: Compiler Design 9/17


Examples I

Check uniqueness of name declarations:


int f = 2;
int f (){return 1; } is not allowed in Pascal, but may be allowed
in other languages
Type checking
int i;
i = 100; okay
i = ”abc”; not okay

CSC 3205: Compiler Design 10/17


Examples II

Expression well-formed
7 + 3.14 Okay in C, not okay in some more strict languages
6 + ”abc” Not okay in C, okay in some other languages
Same name must appear two or more times
Example: defining a block:
Begin B
End B

CSC 3205: Compiler Design 11/17


Examples III

Function calls
Types of arguments match definitions
Number of arguments match definitions
Return type match definitions
Flow-of-control checks
“break” only within loop
“return” only within function
“default” or “error” statement for “switch” in C

CSC 3205: Compiler Design 12/17


Static Vs. Dynamic Checking

Semantic checking during compile time is called “static


checking”
Semantic checking during execution time is called “dynamic
checking”
The more is done at compile time, the fewer opportunities of
error at run time

CSC 3205: Compiler Design 13/17


Static vs. Dynamic Semantics

The static semantics of a language is indirectly related to the


meaning of programs during execution. Its names comes from
the fact that these specifications can be checked at compile
time.
Dynamic semantics refers to meaning of expressions,
statements and other program units. Unlike static semantics,
these cannot be checked at compile time and can only be
checked at runtime.
Division by zero
Array bounds checks

CSC 3205: Compiler Design 14/17


Specification of Programming Languages

PLs require precise definitions (i.e. no ambiguity)


- Language form (Syntax)
- Language meaning (Semantics)
Consequently, PLs are specified using formal notation:
- Formal syntax: Tokens, Grammar
- Formal semantics:
- Attribute Grammars (static semantics),
- Dynamic Semantics

CSC 3205: Compiler Design 15/17


Attribute Grammars

An attribute grammar is a context-free grammar with the


addition of attributes and attribute evaluation rules called
semantic functions.
An attribute grammar can specify both semantics and syntax
while BNF specifies only the syntax.
Attributes are variables to which values are assigned. Each
attribute variable is associated with one or more nonterminals
or terminals of the grammar.

CSC 3205: Compiler Design 16/17


Type Checking
Type checking is the process of verifying that each operation
executed in a program respects the type system of the
language.
all operands in any expression should be of appropriate types
and number.
Much of what is done in the semantic analysis phase is type
checking.
A language is considered strongly typed if each and every type
error is detected during compilation.
Type checking can be done at compilation, during execution,
or divided across both.
Static type checking is done at compile-time. The information
the type checker needs is obtained via declarations and stored
in a master symbol table.
Dynamic type checking is implemented by including type
information for each data location at runtime
CSC 3205: Compiler Design 17/17
Intermediate Representation

Most compilers translate the source program first to some form of


intermediate representation and convert it from there into machine
code.
Intermediate Representation (IR):
An abstract machine language
Expresses operations of target machine
Not specific to any particular machine
Independent of source language
The final phase of the compiler front-end

CSC 3205: Compiler Design 18/17


IR

Suppose we wish to build compilers for n source languages and m


target machines.
Case 1: no IR
Need separate compiler for each source language/target
machine combination.
A total of n ∗ m compilers necessary.
Front-end becomes cluttered with machine specific details,
back-end becomes cluttered with source language specific
details.
Case 2: IR present
Need just n front-ends, m back ends.

CSC 3205: Compiler Design 19/17


Why IR?

CSC 3205: Compiler Design 20/17


Intermediate Representation

Why?
it provides increased abstraction, cleaner separation between
the front and back ends, and adds possibilities for retargeting/
cross-compilation.
To break the difficult problem of translation into two simpler,
more manageable pieces.
supports advanced compiler optimizations and most
optimization is done on this form of the code

CSC 3205: Compiler Design 21/17


Intermediate Representation

IRs are usually categorized according to where they fall


between a high-level language and machine code.
IRs that are close to a high-level language are called high-level
IRs, and IRs that are close to assembly are called low-level IRs
a high-level IR might preserve things like array subscripts or
field accesses whereas a low-level IR converts those into
explicit addresses and offsets
High-level IRs usually preserve information such as loop
structure and if-then-else statements. They tend to reflect the
source language they are compiling more than lower-level IRs.
Medium-level IRs often attempt to be independent of both
the source language and the target machine.
Low-level IRs tend to reflect the target architecture very
closely, and are often machine-dependent

CSC 3205: Compiler Design 22/17


IR

Intermediate representations are usually:


Structured (graph or tree-based)
Flat, tuple-based, generally three-address code (quadruples)
for RISC architectures
Flat, stack-based =0-address code
2-address code for machines with memory-register operations
Or any combination of the above three

while (x < 4 * y) {
x = y / 3 >> x;
if (y) print x - 3;
}

CSC 3205: Compiler Design 23/17


IR: Semantic Graph

CSC 3205: Compiler Design 24/17


IR: Tuples

(JUMP, L2) goto L2


(LABEL, L1) L1:
(SHR, 3, x, t0) t0 := 3 >> x
(DIV, y, t0, t1) t1 := y / t0
(COPY, t1, x) x := t1
(JZ, y, L3) if y == 0 goto L3
(SUB, x, 3, t2) t2 := x - 3
(PRINT, t2) print t2
(LABEL, L3) L3:
(LABEL, L2) L2:
(MUL, 4, y, t4) t4 := 4 * y
(LT, x, t4, t5) x := t4 < t5
(JNZ, t5, L1) if t5 != 0 goto L1

CSC 3205: Compiler Design 25/17


Stack Code

goto L2
L1:
load y
load_constant 3
load x
shr
div
store x
load y
jump_if_zero L3
load x
load_constant 3
sub
print
L3:
L2:
load x
load_constant 4
load y
mul
less_than
jump_if_not_zero L1

CSC 3205: Compiler Design 26/17


Three-Address Code

Instructions are very simple


Examples: a = b + c, x = −y , ifa > bgotoL1
LHS is the target and the RHS has at most two sources and
one operator
RHS sources can be either variables or constants
Three-address code is a generic form and can be implemented
as quadruples, triples, indirect triples, tree or DAG
Example: The three-address code for a + b ∗ c − d/(b ∗ c) is
below
1. t1 = b*c
2. t2 = a+t1
3. t3 = b*c
4. t4 = d/t3
5. t5 = t2-t4

CSC 3205: Compiler Design 27/17


Three-Address Code

CSC 3205: Compiler Design 28/17

You might also like