Csc3205-Semantic Analysis and Intermediate RepresentationFile

Semantic Analysis & Intermediate Representation
CSC 3205: Compiler Design
Marriette Katarahweire
CSC 3205: Compiler Design 1/17

Semantic Analysis

Semantic Analysis
Semantic Analysis computes additional information related to

the meaning of the program once the syntactic structure is
known, i.e., try to understand the “meaning” of the program.
Try to verify a program “make sense.”
After parsing (and constructing AST) next is semantics
analysis & intermediate code generation
In typed languages as C, semantic analysis involves adding
information to the symbol table and performing type checking.

What does Semantic Analysis involve?
Semantic analysis typically involves:

Type checking – Data types are used in a manner that is
consistent with their definition (i. e., only with compatible
data types, only with operations that are defined for them,
etc.)
Label Checking – Labels references in a program must exist.
Flow control checks – control structures must be used in their
proper fashion (no GOTOs into a FORTRAN DO statement,
no breaks outside a loop or switch statement, etc.)

Semantic Analysis
Compiler must do more than recognize whether a sentence

belongs to the language. . .
Find remaining errors that would make program invalid
- undefined variables, types
- type errors that can be caught statically
Figure out useful information for later phases
- types of all expressions
- data layout

Kind of Checks
Uniqueness checks
- Certain names must be unique
- Many languages require variable declarations
Flow-of-control checks
- Match control-flow operators with structures
- Example: break applies to innermost loop/switch
Type checks
- Check compatibility of operators and operands
Logical checks
- Program is syntactically and semantically correct, but
does not do the “correct” thing

Examples of Reported Errors
Undeclared identifier
Multiply declared identifier
Index out of bounds
Wrong number or types of args to call
Incompatible types for operation
Break statement outside switch/loop
Goto with no label

Examples of Reported Errors
How do these checks help compilers?

Allocate right amount of space for variables
Select right machine operations
Proper implementation of control structures
Try compiling this code:
void main(){
int i=21, j=42;
printf(\Hello World\n");
printf(\Hello World, N=%d\n");
printf(\Hello World\n", i, j);
printf(\Hello World, I=%d, J=%d\n", i, j);
}

Typical Semantics Errors
Multiple declarations: a variable should be declared (in the

same scope) at most once
Undeclared variable: a variable should not be used before
being declared
Type mismatch: type of the LHS of an assignment should
match the type of the RHS
Wrong arguments: methods should be called with the right
number and types of arguments

Examples I
Check uniqueness of name declarations:

int f = 2;
int f (){return 1; } is not allowed in Pascal, but may be allowed
in other languages
Type checking
int i;
i = 100; okay
i = ”abc”; not okay

Examples II
Expression well-formed
7 + 3.14 Okay in C, not okay in some more strict languages
6 + ”abc” Not okay in C, okay in some other languages
Same name must appear two or more times
Example: defining a block:
Begin B
End B

Examples III
Function calls
Types of arguments match definitions
Number of arguments match definitions
Return type match definitions
Flow-of-control checks
“break” only within loop
“return” only within function
“default” or “error” statement for “switch” in C

Static Vs. Dynamic Checking
Semantic checking during compile time is called “static

checking”
Semantic checking during execution time is called “dynamic
checking”
The more is done at compile time, the fewer opportunities of
error at run time

Static vs. Dynamic Semantics
The static semantics of a language is indirectly related to the

meaning of programs during execution. Its names comes from
the fact that these specifications can be checked at compile
time.
Dynamic semantics refers to meaning of expressions,
statements and other program units. Unlike static semantics,
these cannot be checked at compile time and can only be
checked at runtime.
Division by zero
Array bounds checks

Specification of Programming Languages
PLs require precise definitions (i.e. no ambiguity)

- Language form (Syntax)
- Language meaning (Semantics)
Consequently, PLs are specified using formal notation:
- Formal syntax: Tokens, Grammar
- Formal semantics:
- Attribute Grammars (static semantics),
- Dynamic Semantics

Attribute Grammars
An attribute grammar is a context-free grammar with the

addition of attributes and attribute evaluation rules called
semantic functions.
An attribute grammar can specify both semantics and syntax
while BNF specifies only the syntax.
Attributes are variables to which values are assigned. Each
attribute variable is associated with one or more nonterminals
or terminals of the grammar.

Type Checking
Type checking is the process of verifying that each operation
executed in a program respects the type system of the
language.
all operands in any expression should be of appropriate types
and number.
Much of what is done in the semantic analysis phase is type
checking.
A language is considered strongly typed if each and every type
error is detected during compilation.
Type checking can be done at compilation, during execution,
or divided across both.
Static type checking is done at compile-time. The information
the type checker needs is obtained via declarations and stored
in a master symbol table.
Dynamic type checking is implemented by including type
information for each data location at runtime
Intermediate Representation
Most compilers translate the source program first to some form of

intermediate representation and convert it from there into machine
code.
Intermediate Representation (IR):
An abstract machine language
Expresses operations of target machine
Not specific to any particular machine
Independent of source language
The final phase of the compiler front-end

IR
Suppose we wish to build compilers for n source languages and m

target machines.
Case 1: no IR
Need separate compiler for each source language/target
machine combination.
A total of n ∗ m compilers necessary.
Front-end becomes cluttered with machine specific details,
back-end becomes cluttered with source language specific
details.
Case 2: IR present
Need just n front-ends, m back ends.

Why IR?

Why?
it provides increased abstraction, cleaner separation between
the front and back ends, and adds possibilities for retargeting/
cross-compilation.
To break the difficult problem of translation into two simpler,
more manageable pieces.
supports advanced compiler optimizations and most
optimization is done on this form of the code

IRs are usually categorized according to where they fall

between a high-level language and machine code.
IRs that are close to a high-level language are called high-level
IRs, and IRs that are close to assembly are called low-level IRs
a high-level IR might preserve things like array subscripts or
field accesses whereas a low-level IR converts those into
explicit addresses and offsets
High-level IRs usually preserve information such as loop
structure and if-then-else statements. They tend to reflect the
source language they are compiling more than lower-level IRs.
Medium-level IRs often attempt to be independent of both
the source language and the target machine.
Low-level IRs tend to reflect the target architecture very
closely, and are often machine-dependent

IR
Intermediate representations are usually:

Structured (graph or tree-based)
Flat, tuple-based, generally three-address code (quadruples)
for RISC architectures
Flat, stack-based =0-address code
2-address code for machines with memory-register operations
Or any combination of the above three
while (x < 4 * y) {
x = y / 3 >> x;
if (y) print x - 3;
}

IR: Semantic Graph

IR: Tuples
(JUMP, L2) goto L2

(LABEL, L1) L1:
(SHR, 3, x, t0) t0 := 3 >> x
(DIV, y, t0, t1) t1 := y / t0
(COPY, t1, x) x := t1
(JZ, y, L3) if y == 0 goto L3
(SUB, x, 3, t2) t2 := x - 3
(PRINT, t2) print t2
(LABEL, L3) L3:
(LABEL, L2) L2:
(MUL, 4, y, t4) t4 := 4 * y
(LT, x, t4, t5) x := t4 < t5
(JNZ, t5, L1) if t5 != 0 goto L1

Stack Code
goto L2
L1:
load y
load_constant 3
load x
shr
div
store x
load y
jump_if_zero L3
load x
load_constant 3
sub
print
L3:
L2:
load x
load_constant 4
load y
mul
less_than
jump_if_not_zero L1

Three-Address Code
Instructions are very simple

Examples: a = b + c, x = −y , ifa > bgotoL1
LHS is the target and the RHS has at most two sources and
one operator
RHS sources can be either variables or constants
Three-address code is a generic form and can be implemented
as quadruples, triples, indirect triples, tree or DAG
Example: The three-address code for a + b ∗ c − d/(b ∗ c) is
below
1. t1 = b*c
2. t2 = a+t1
3. t3 = b*c
4. t4 = d/t3
5. t5 = t2-t4

Three-Address Code

Csc3205-Semantic Analysis and Intermediate RepresentationFile

Uploaded by

Copyright:

Available Formats

You might also like

Csc3205-Semantic Analysis and Intermediate RepresentationFile

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Csc3205-Semantic Analysis and Intermediate RepresentationFile

Uploaded by

Copyright:

Available Formats

Semantic Analysis & Intermediate Representation

CSC 3205: Compiler Design

CSC 3205: Compiler Design 1/17

CSC 3205: Compiler Design 2/17

Semantic Analysis computes additional information related to

CSC 3205: Compiler Design 3/17

Semantic analysis typically involves:

CSC 3205: Compiler Design 4/17

Compiler must do more than recognize whether a sentence

CSC 3205: Compiler Design 5/17

CSC 3205: Compiler Design 6/17

CSC 3205: Compiler Design 7/17

How do these checks help compilers?

CSC 3205: Compiler Design 8/17

Multiple declarations: a variable should be declared (in the

CSC 3205: Compiler Design 9/17

Check uniqueness of name declarations:

CSC 3205: Compiler Design 10/17

CSC 3205: Compiler Design 11/17

CSC 3205: Compiler Design 12/17

Semantic checking during compile time is called “static

CSC 3205: Compiler Design 13/17

The static semantics of a language is indirectly related to the

CSC 3205: Compiler Design 14/17

PLs require precise definitions (i.e. no ambiguity)

CSC 3205: Compiler Design 15/17

An attribute grammar is a context-free grammar with the

CSC 3205: Compiler Design 16/17

Most compilers translate the source program first to some form of

CSC 3205: Compiler Design 18/17

Suppose we wish to build compilers for n source languages and m

CSC 3205: Compiler Design 19/17

CSC 3205: Compiler Design 20/17

CSC 3205: Compiler Design 21/17

IRs are usually categorized according to where they fall

CSC 3205: Compiler Design 22/17

Intermediate representations are usually:

CSC 3205: Compiler Design 23/17

CSC 3205: Compiler Design 24/17

(JUMP, L2) goto L2

CSC 3205: Compiler Design 25/17

CSC 3205: Compiler Design 26/17

Instructions are very simple

CSC 3205: Compiler Design 27/17

CSC 3205: Compiler Design 28/17

You might also like