EEX6335 - Compiler Design EEX6363 - Compiler Construction: Lets Consider Detail Analysis of Semantic Analyzer

EEX6335 – Compiler Design
EEX6363 – Compiler Construction
Day School – 4
by
Gehan Anthonys
Bachelor of Technology Honours in Engineering

Bachelor of Software Engineering Honours
Department of Electrical and Computer Engineering

Faculty of Engineering
The Open University of Sri Lanka
Lets consider detail analysis of semantic analyzer:
Semantic Analysis
• Categories
• Identifiers
• Symbol Table
• Rules
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 5
1
Phases of Compilation
Semantic Analysis: the problem
Where we are?
So far we were able to check:
• the program includes correct “words”;
• “words” are combined in correct “sentences”;
Why Semantic Analysis?
• Lexically and syntactically correct programs may still

contain other errors;
• Lexical and syntax analyses are not power enough to

ensure the correct usage of variables, objects, functions, ...
2
What’s next?
We would like to:
• perform additional checks to increase guarantees of correctness;
• transform the program from the source language into the target
one, and according to precisely defined semantic rules;
Additional Checks:
There are many additional checks that can be performed to increase

correctness of code:
• Coherent (clear idea) usage of variables
➢ definition-usage;
➢ type;
• Existence of unreachable code blocks, . . .
Here, it is mainly focused on the mechanisms for type checking and

generation of intermediate code
Semantic analysis:
• Ensure that the program satisfies a set of rules regarding the
usage of programming constructs (variables, objects,
expressions, statements & etc.,)
• The principal job of the semantic analyzer is to enforce static

semantic rules.
• In general, anything that requires the compiler to compare

things that are separate by a long distance or to count things
ends up being a matter of semantics.
• The semantic analyzer also commonly constructs a syntax tree

(usually first), and much of the information it gathers is needed
by the code generator.
3
Syntax Directed Definitions:
• Attributes are used to associate characteristics and store

values associated to grammar symbols.
• A syntax directed definition (SDD) provides the semantic

rules to permit the definition of the values for the attributes.
➢ attributes are associated to grammar symbols and can

be of any kind;
➢ rules are associated to productions.
An SDD can be defined using two different kinds of attributes:
• Synthesized attributes: a synthesized attributes at

node N is defined only in terms of attribute values at
the children of N and at N itself;
➢ can be evaluated during a single bottom-up traversal of parse
tree;
➢ the production must have non-terminal as its head.
➢ can be contained by both the terminals or non-terminals.
• Inherited attributes: an inherited attribute at node N is

defined only in terms of attribute values at N’s parent,
N itself, and N’s siblings;
➢ can be evaluated during a single top-down and sideways
traversal of parse tree.
➢ the production must have non-terminal as a symbol in its body.
➢ can’t be contained by both, it is only contained by non-terminals.
4
Attribute Grammars
• Context-Free Grammars (CFGs) are used to specify the syntax
of programming languages.
E.g. arithmetic expressions
How do we tie these rules to mathematical concepts?

• Attribute grammars are annotated CFGs in which annotations
are used to establish meaning relationships among symbols
– Annotations are also known as decorations
• Each grammar symbols has a set of attributes

E.g. the value of E1 is the attribute E1.val
• Each grammar rule has a set of rules over the symbol attributes
-- Copy rules, Semantic Function rules
E.g. sum, quotient
Example
5
Attribute Flow
Example
• The figure shows the result of
annotating the parse tree for (1+3)*2
• Each symbols has at most one

attribute shown in the corresponding
box
– Numerical value in this example
– Operator symbols have no value
• Arrows represent attribute flow
Attribute Flow
Example
6
Categories of Semantic Analysis
• Examples of semantic rules

– Variables must be defined before being used
– A variable should not be defined multiple times
– In an assignment statement, the variable and the expression
must have the same type
– The test expression of an if statement must have Boolean
type
• Two major categories

– Semantic rules regarding types
– Semantic rules regarding scopes
Type information and Type checking
• Type Information:
Describes what kind of values correspond to different
constructs: variables, statements, expressions, functions, etc.
– variables: int a; integer
– expressions: (a+1) == 2 boolean
– statements: a = 1.0; floating-point
– functions: int pow(int n, int m) int = int, int
• Type Checking:
Set of rules which ensures the type consistency of different
constructs in the program
7
Scope Information
• Characterizes the declaration of identifiers and the portions

of the program where it is allowed to use each identifier
E.g., identifiers: variables, functions, objects, labels
• Lexical scope: textual region in the program

E.g.,
Statement block, formal argument list, object body,
function or method body, source file, whole program
• Scope of an identifier:
The lexical scope its declaration refers to.
Variable Scope
• Scope of variables in statement blocks:
{ int a;
... scope of variable a
{int b;
... scope of variable b
}
....
}
• Scope of global variables: current file

• Scope of external variables: whole program
8
Function Parameter and Label Scope
• Scope of formal arguments of functions:
int foo(int n) {
...
} scope of argument n
• Scope of labels:
void foo() {
... goto lab;
...
lab: i++;
... goto lab; scope of label lab,
...
} Note: in C, all labels have function
scope regardless of where they are
Scope in Class declaration
• Scope of object fields and methods:
class A {
public:
void f() {x=1;} scope of variable x
... and method f
private:
int x;
...
}
9
Semantic Rules for Scopes
• Main rules regarding scopes:
– Rule 1: Use each identifier only within its scope
– Rule 2: Do not declare identifier of the same kind with
identical names more than once in the same lexical scope
int X(int X) {
class X { int X;
int X; goto X;
void X(int X) { {
X: ... int X;
goto X; X: X = 1;
} } Both are legal but NOT
} } recommended!
Symbol Tables
• Semantic checks refer to properties of identifiers in the

program – their scope or type
• Need an environment to store the information about

identifiers = symbol table
• Each entry in the symbol table contains:

– Name of an identifier
– Additional info about identifier: kind, type, constant?
NAME KIND TYPE ATTRIBUTES

foo func int,int → int extern
m arg int
n arg int const
tmp var char const
10
Scope Information
• How to capture the scope information in the symbol table?
• Idea:
– There is a hierarchy of scopes in the program
– Use similar hierarchy of symbol tables
– One symbol table for each scope
– Each symbol table contains the symbols declared in that
lexical scope
Example
Global symtab
int x; x var int
f func int → void
void f(int m) { g func int → int
float x, y; func g
func f
... symtab
symtab
{int i, j; ....; }
{int x; l: ...; } m arg int n arg int
} x var float t var char
y var float
int g(int n) {
char t;
... ;
} i var int x var int
j var int l label
11
Problem
Associate each
definition of x Global symtab
with its appropriate
x var int
symbol table entry
int x; f func int → void
void f(int m) { g func int → int
float x, y;
...
{int i, j; x=1; }
{int x; l: x=2; } m arg int n arg int
} x var float t var char
y var float
int g(int n) {
char t;
x=3;
} i var int x var int
j var int l label
Catching Semantic Errors
Error!
undefined Global symtab
int x;
variable x var int
void f(int m) {
float x, y; f func int → void
... g func int → int
{int i, j; x=1; }
{int x; l: i=2; }
} m arg int n arg int
x var float t var char
int g(int n) { y var float
char t;
x=3;
}
i var int x var int
j var int l label
i=2
12
Symbol Table Operations
• Two operations:
– To build symbol tables, we need to insert new identifiers
in the table
– In the subsequent stages of the compiler we need to
access the information from the table: use lookup
function
• Cannot build symbol tables during lexical analysis

– Hierarchy of scopes encoded in syntax
• Build the symbol tables:

– While parsing, using the semantic actions
– After the AST is constructed
List Implementation
• Simple implementation using a list

– One cell per entry in the table
– Can grow dynamically during compilation
. . . .
foo m n tmp
func var var var
int,int int int char
→ int
• Disadvantage: inefficient for large tables

– Need to scan half the list on average
13
Hash table implementation
• Efficient implementation using hash table
– Array of lists (buckets)
– Use a hash on symbol name to map to corresponding bucket
• Hash func: identifier name (string) → int
• Note: include identifier type in match function
Forward References
• Use of an identifier within the scope of its declaration, but
before it is declared
• Any compiler phase that uses the information
from the symbol table must be performed E.g.,
after the table is constructed
class A {
• Cannot type-check and build symbol table int m() {return n(); }
at the same time int n() {return 1; }
}
Type Checking
• Semantic checks to enforce the type safety of the program
• Examples
– Unary and binary operators (e.g. +, ==, [ ]) must receive
operands of the proper type
– Functions must be invoked with the right number and type of
arguments
– Return statements must agree with the return type
– In assignments, assigned value must be compatible with type of
variable on LHS
– Class members accessed appropriately
14
4 Concepts related to Types/Languages
1. Static vs dynamic checking -- When to check types
2. Static vs dynamic typing -- When to define types
3. Strong vs weak typing -- How many type errors
4. Sound type systems -- Statically catch all type errors
Static vs Dynamic Checking

• Static type checking
– Perform at compile time
• Dynamic type checking
– Perform at run time (as the program executes)
• Examples of dynamic checking
– Array bounds checking
– Null pointer dereferences
Static vs Dynamic Typing

Static and dynamic typing refer to type definitions
(i.e., bindings of types to variables, expressions, etc.)
• Static typed language
– Types defined at compile-time and do not change during
the execution of the program
• C, C++, Java, Pascal
• Dynamically typed language
– Types defined at run-time, as program executes
• Lisp, Smalltalk
15
Strong vs Weak Typing
• Refer to how much type consistency is enforced
• Strongly typed languages -- Guarantee accepted programs are type-safe
• Weakly typed languages -- Allow programs which contain type errors
• These concepts refer to run-time -- Can achieve strong typing using
either static or dynamic typing
Soundness
• Sound type systems: can statically ensure that the program is
type-safe
• Soundness implies strong typing
• Static type safety requires a conservative approximation of the
values that may occur during all possible executions
– May reject type-safe programs
– Need to be expressive: reject as few type-safe programs as possible
Type Systems
• Type is predicate on a value
• Type expressions: Describe the possible types in the program
– E.g., int, char*, array[], object, etc.
• Type system: Defines types for language constructs
– E.g., expressions, statements
Type Expressions
• Language type systems have basic types (aka: primitive types
or ground types) -- E.g., int, char*, double
• Build type expressions using basic types:
– Type constructors
• Array types; Structure types; Pointer types
– Type aliases
– Function types
16
Type Expressions: Arrays
• Various kinds of array types in different programming languages

• Array(T): arrays without bounds
– C, Java: T[ ]
• Array(T,S): array with size
– C: T[S], may be indexed 0 .. S-1
• Array(T,L,U): array with upper/lower bounds
– Pascal: array[L .. U] of T
• Array(T, S1, …, Sn): multi-dimensional arrays
– Fortran: T(L1, …, Ln)
Type Expressions: Structures / Functions
• Structures
– Has form {id1: T1, … , idn: Tn} for some identifiers idi and
types Ti
– Is essentially cartesian product: (id1 x T1) x … x (idn x Tn)
– Supports access operations on each field, with
corresponding type
– Objects: extension of structure types
• Functions
– Type: T1 x T2 x … x Tn → Tr
– Function value can be invoked with some argument
expressions with types Ti, returns type Tr
17
Type Expressions: Aliases / Pointers
• Type aliases
– C: typedef int int_array[ ];
– Aliases are not type constructors
• Int_array is the same type as int [ ]
– Problem: Different type expressions may denote the same
type
• Pointers
– Pointer types characterize values that are addresses of
variables of other types
– C pointers: T* e.g., int *x;
Implementation
• Use a separate class hierarchy for types:

– class BaseType extends Type {String name;}
– class IntType extends BaseType { ... }
– class FloatType extends BaseType { ... }
– class ArrayType extends BaseType { ... }
– class FunctionType extends BaseType { ... }
• Semantic analysis translates all type expressions to type objects
• Symbol table binds name to type object
18
Creating Type Objects
• Build types while parsing – use a syntax-directed definition

non terminal Type type
type : INTEGER
{$$ = new IntType(id); }
| ARRAY LBRACKET type RBRACKET
{$$ = new ArrayType($3); } ;
• Type objects are the abstract syntax tree (AST) nodes for type
expressions
Type Checking
• Type checking is verify typing rules

– E.g., “Operands of + must be integer expressions; the result is
an integer expression”
• Option 1: Implement using syntax-directed definitions (type-check

during the parsing)
expr: expr PLUS expr {

if ($1 == IntType && $3 == IntType)
$$ = IntType
else
TypeCheckError(“+”);
}
19
Option 2: First build the AST, then implement type checking by
recursive traversal of the AST nodes:
class Add extends Expr {

Type typeCheck() {
Type t1 = e1.typeCheck(), t2 = e2.typeCheck();
if (t1 == Int && t2 == Int) return Int
else TypeCheckError(“+”);
}
}
Type Checking Identifiers

• Identifier expressions: Lookup the type in the symbol table
class IdExpr extends Expr {

Identifier id;
Type typeCheck() {return id.lookupType(); }
}
Type judgments for statements

• Statements may be expressions (i.e., represent values)
• Use type judgments for statements:
-- if (b) 2 else 3 : int
-- x == 10 : bool
-- b = true, y = 2 : int
• For statements which are not expressions: use a special unit type
(void or empty type)
– S : unit means “S is a well-typed statement with no result type”
Deriving a Judgment
• Consider the judgment
– if (b) 2 else 3 : int
• What do we need to decide that this is a well-typed expression
of type int?
– b must be a bool (b : bool); 2 must be an int (2 : int);
– 3 must be an int (3 : int)
20
Static Semantics and Type Judgments
• Static semantics is the formal notation which describes type

judgments:
– E:T
– means “E is a well-typed expression of type T”
• Type judgment examples:

-- 2 : int
-- true : bool
-- 2 * (3 + 4) : int
-- “Hello” : string
• Type judgment notation: A E:T ⊥

– Means “In the context A, the expression E is a well-typed
expression with type T”
• Type context is a set of type bindings: id : T

(i.e. type context = symbol table)
⊥ ⊥⊥
b: bool, x: int b: bool

b: bool, x: int if (b) 2 else x : int
2 + 2 : int
Deriving a Judgment
• To show
– b: bool, x: int
⊥
if (b) 2 else x : int
• Need to show
– b: bool, x: int
⊥⊥ ⊥
b : bool
– b: bool, x: int 2 : int
– b: bool, x: int x : int
21
Assignment Statements
⊥ ⊥ ⊥
id : T  A A E3 : T
A E:T A E2 : int
⊥
A E1 : array[T]
A ⊥ id = E : T
A E1[E2] = E3 : T (array-assign)
⊥
(variable-assign)
Sequence Statements
• Rule: A sequence of statements is well-typed if the first statement
is well-typed, and the remaining are well-typed as well:
A S1 : T1
⊥ ⊥
A (S2; .... ; Sn) : Tn

(sequence)
A (S1; S2; .... ; Sn) : Tn
⊥
If Statements
• If statement as an expression: its value is the value of the clause that is

executed
A E : bool
⊥ ⊥
A S1 : T A S2 : T
⊥
(if-then-else)
A if (E) S1 else S2 : T
⊥
• If with no else clause, no value, why??
A E : bool
⊥ ⊥
A S:T
(if-then)
A if (E) S : unit
⊥
22
Declarations
= unit if no E
A id : T [ = E ] : T1
⊥
A, id : T (S2; .... ; Sn) : Tn
⊥
(declaration)
A (id : T [ = E ]; S2; .... ; Sn) : Tn
⊥
Declarations add entries to the environment (e.g., the symbol table)
Function Calls
• If expression E is a function value, it has a type
T1 x T2 x ... x Tn → Tr
• Ti are argument types, where I = 1,2,…,n; and Tr is the return type
• How to type-check a function call? E(E1, ..., En)
A E : T1 x T2 x ... Tn → Tr
⊥ ⊥
A Ei : Ti (i  1 ... n)
(function-call)
⊥
A E(E1, ..., En) : Tr

Function Declarations
• Consider a function declaration of the form:
– Tr fun (T1 a1, ... , Tn an) = E
– Equivalent to:
• Tr fun (T1 a1, ..., Tn an) {return E;}
• Type of function body S must match declared return type of
function, i.e., E : Tr
• But, in what type context?
Add arguments to environment

• Let A be the context surrounding the function declaration.
– The function declaration: Tr fun (T1 a1, ... , Tn an) = E
– Is well-formed if
A, a1 : T1 , ... , an : Tn E : Tr
⊥
• What about recursion?

– Need: fun: T1 x T2 x ... x Tn → Tr  A
23
Mutual Recursion
• Example
– int f(int x) = g(x) + 1;
– int g(int x) = f(x) – 1;
• Need environment containing at least

• f: int → int, g: int → int
• when checking both f and g
• Two-pass approach:
– Scan top level of AST picking up all function signatures
and creating an environment binding all global identifiers
– Type-check each function individually using this global
environment
How to check return?

A E:T
⊥⊥
• A return statement produces no value

for its containing context to use A return E : unit
(return)
• Does not return control to containing context
• Suppose we use type unit ...
– Then how to make sure the return type of the current function is T?
Put return in the Symbol table:

• Add a special entry {return_fun : T} when we start checking the
function “fun”, look up this entry when we hit a return statement
• To check Tr fun (T1 a1, ... , Tn an) { S } in environment A,
need to check:
A, a1 : T1, ..., an : Tn, return_fun : Tr A : Tr
⊥
A E:T return_fun : T  A
⊥
(return)
A return E : unit
⊥
24
Static Semantics Summary
• Static semantics = formal specification of type-checking rules

• Concise form of static semantics: typing rules expressed as
inference rules
• Expression and statements are well-formed (or well-typed) if
a typing derivation (proof tree) can be constructed using the
inference rules
Review of Semantic Analysis

• Check errors not detected by lexical or syntax analysis
• Scope errors
– Variables not defined
– Multiple declarations
• Type errors
– Assignment of values of different types
– Invocation of functions with different number of parameters or
parameters of incorrect type
– Incorrect use of return statements
• The most common way of specifying the semantics of a language

is plain English
• There is a lack of formal rigor in the semantic specification of

programming languages -- guess why
25
Where We Are...
Source code
(character stream) Lexical Analysis regular expressions
token stream
Syntax Analysis grammars
abstract syntax
tree
Semantic Analysis static semantics
abstract syntax
tree + symbol
tables, types Intermediate Code Gen
Intermediate code
26

EEX6335 - Compiler Design EEX6363 - Compiler Construction: Lets Consider Detail Analysis of Semantic Analyzer

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EEX6335 - Compiler Design EEX6363 - Compiler Construction: Lets Consider Detail Analysis of Semantic Analyzer

Uploaded by

Copyright:

Available Formats

EEX6335 – Compiler Design

EEX6363 – Compiler Construction

Bachelor of Technology Honours in Engineering

Department of Electrical and Computer Engineering

Lets consider detail analysis of semantic analyzer:

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 5

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 6

Semantic Analysis: the problem

Why Semantic Analysis?

• Lexically and syntactically correct programs may still

• Lexical and syntax analyses are not power enough to

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 7

There are many additional checks that can be performed to increase

Here, it is mainly focused on the mechanisms for type checking and

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 8

• The principal job of the semantic analyzer is to enforce static

• In general, anything that requires the compiler to compare

• The semantic analyzer also commonly constructs a syntax tree

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 9

• Attributes are used to associate characteristics and store

• A syntax directed definition (SDD) provides the semantic

➢ attributes are associated to grammar symbols and can

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 10

An SDD can be defined using two different kinds of attributes:

• Synthesized attributes: a synthesized attributes at

• Inherited attributes: an inherited attribute at node N is

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 11

How do we tie these rules to mathematical concepts?

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 12

• Each grammar symbols has a set of attributes

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 13

• Each symbols has at most one

• Arrows represent attribute flow

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 14

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 15

• Examples of semantic rules

• Two major categories

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 16

Type information and Type checking

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 17

• Characterizes the declaration of identifiers and the portions

• Lexical scope: textual region in the program

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 18

• Scope of variables in statement blocks:

• Scope of global variables: current file

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 19

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 20

Scope in Class declaration

• Scope of object fields and methods:

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 21

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 22

• Semantic checks refer to properties of identifiers in the

• Need an environment to store the information about

• Each entry in the symbol table contains:

NAME KIND TYPE ATTRIBUTES

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 23

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 24

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 25

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 27

Catching Semantic Errors

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 28

• Cannot build symbol tables during lexical analysis