Professional Documents
Culture Documents
EEX6335 - Compiler Design EEX6363 - Compiler Construction: Lets Consider Detail Analysis of Semantic Analyzer
EEX6335 - Compiler Design EEX6363 - Compiler Construction: Lets Consider Detail Analysis of Semantic Analyzer
Day School – 4
by
Gehan Anthonys
Semantic Analysis
• Categories
• Identifiers
• Symbol Table
• Rules
1
Phases of Compilation
Where we are?
So far we were able to check:
• the program includes correct “words”;
• “words” are combined in correct “sentences”;
2
What’s next?
We would like to:
• perform additional checks to increase guarantees of correctness;
• transform the program from the source language into the target
one, and according to precisely defined semantic rules;
Additional Checks:
Semantic analysis:
• Ensure that the program satisfies a set of rules regarding the
usage of programming constructs (variables, objects,
expressions, statements & etc.,)
3
Syntax Directed Definitions:
4
Attribute Grammars
• Context-Free Grammars (CFGs) are used to specify the syntax
of programming languages.
E.g. arithmetic expressions
• Each grammar rule has a set of rules over the symbol attributes
-- Copy rules, Semantic Function rules
E.g. sum, quotient
Example
5
Attribute Flow
Example
• The figure shows the result of
annotating the parse tree for (1+3)*2
Attribute Flow
Example
6
Categories of Semantic Analysis
• Type Information:
Describes what kind of values correspond to different
constructs: variables, statements, expressions, functions, etc.
– variables: int a; integer
– expressions: (a+1) == 2 boolean
– statements: a = 1.0; floating-point
– functions: int pow(int n, int m) int = int, int
• Type Checking:
Set of rules which ensures the type consistency of different
constructs in the program
7
Scope Information
• Scope of an identifier:
The lexical scope its declaration refers to.
Variable Scope
{ int a;
... scope of variable a
{int b;
... scope of variable b
}
....
}
8
Function Parameter and Label Scope
• Scope of formal arguments of functions:
int foo(int n) {
...
} scope of argument n
• Scope of labels:
void foo() {
... goto lab;
...
lab: i++;
... goto lab; scope of label lab,
...
} Note: in C, all labels have function
scope regardless of where they are
class A {
public:
void f() {x=1;} scope of variable x
... and method f
private:
int x;
...
}
9
Semantic Rules for Scopes
• Main rules regarding scopes:
– Rule 1: Use each identifier only within its scope
– Rule 2: Do not declare identifier of the same kind with
identical names more than once in the same lexical scope
int X(int X) {
class X { int X;
int X; goto X;
void X(int X) { {
X: ... int X;
goto X; X: X = 1;
} } Both are legal but NOT
} } recommended!
Symbol Tables
10
Scope Information
• How to capture the scope information in the symbol table?
• Idea:
– There is a hierarchy of scopes in the program
– Use similar hierarchy of symbol tables
– One symbol table for each scope
– Each symbol table contains the symbols declared in that
lexical scope
Example
Global symtab
int x; x var int
f func int → void
void f(int m) { g func int → int
float x, y; func g
func f
... symtab
symtab
{int i, j; ....; }
{int x; l: ...; } m arg int n arg int
} x var float t var char
y var float
int g(int n) {
char t;
... ;
} i var int x var int
j var int l label
11
Problem
Associate each
definition of x Global symtab
with its appropriate
x var int
symbol table entry
int x; f func int → void
void f(int m) { g func int → int
float x, y;
...
{int i, j; x=1; }
{int x; l: x=2; } m arg int n arg int
} x var float t var char
y var float
int g(int n) {
char t;
x=3;
} i var int x var int
j var int l label
Error!
undefined Global symtab
int x;
variable x var int
void f(int m) {
float x, y; f func int → void
... g func int → int
{int i, j; x=1; }
{int x; l: i=2; }
} m arg int n arg int
x var float t var char
int g(int n) { y var float
char t;
x=3;
}
i var int x var int
j var int l label
i=2
12
Symbol Table Operations
• Two operations:
– To build symbol tables, we need to insert new identifiers
in the table
– In the subsequent stages of the compiler we need to
access the information from the table: use lookup
function
List Implementation
. . . .
foo m n tmp
func var var var
int,int int int char
→ int
13
Hash table implementation
• Efficient implementation using hash table
– Array of lists (buckets)
– Use a hash on symbol name to map to corresponding bucket
• Hash func: identifier name (string) → int
• Note: include identifier type in match function
Forward References
• Use of an identifier within the scope of its declaration, but
before it is declared
• Any compiler phase that uses the information
from the symbol table must be performed E.g.,
after the table is constructed
class A {
• Cannot type-check and build symbol table int m() {return n(); }
at the same time int n() {return 1; }
}
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 31
Type Checking
• Examples
– Unary and binary operators (e.g. +, ==, [ ]) must receive
operands of the proper type
– Functions must be invoked with the right number and type of
arguments
– Return statements must agree with the return type
– In assignments, assigned value must be compatible with type of
variable on LHS
– Class members accessed appropriately
14
4 Concepts related to Types/Languages
1. Static vs dynamic checking -- When to check types
2. Static vs dynamic typing -- When to define types
3. Strong vs weak typing -- How many type errors
4. Sound type systems -- Statically catch all type errors
15
Strong vs Weak Typing
• Refer to how much type consistency is enforced
• Strongly typed languages -- Guarantee accepted programs are type-safe
• Weakly typed languages -- Allow programs which contain type errors
• These concepts refer to run-time -- Can achieve strong typing using
either static or dynamic typing
Soundness
• Sound type systems: can statically ensure that the program is
type-safe
• Soundness implies strong typing
• Static type safety requires a conservative approximation of the
values that may occur during all possible executions
– May reject type-safe programs
– Need to be expressive: reject as few type-safe programs as possible
Type Systems
• Type is predicate on a value
• Type expressions: Describe the possible types in the program
– E.g., int, char*, array[], object, etc.
• Type system: Defines types for language constructs
– E.g., expressions, statements
Type Expressions
• Language type systems have basic types (aka: primitive types
or ground types) -- E.g., int, char*, double
• Build type expressions using basic types:
– Type constructors
• Array types; Structure types; Pointer types
– Type aliases
– Function types
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 40
16
Type Expressions: Arrays
• Structures
– Has form {id1: T1, … , idn: Tn} for some identifiers idi and
types Ti
– Is essentially cartesian product: (id1 x T1) x … x (idn x Tn)
– Supports access operations on each field, with
corresponding type
– Objects: extension of structure types
• Functions
– Type: T1 x T2 x … x Tn → Tr
– Function value can be invoked with some argument
expressions with types Ti, returns type Tr
17
Type Expressions: Aliases / Pointers
• Type aliases
– C: typedef int int_array[ ];
– Aliases are not type constructors
• Int_array is the same type as int [ ]
– Problem: Different type expressions may denote the same
type
• Pointers
– Pointer types characterize values that are addresses of
variables of other types
– C pointers: T* e.g., int *x;
Implementation
18
Creating Type Objects
• Type objects are the abstract syntax tree (AST) nodes for type
expressions
Type Checking
19
Option 2: First build the AST, then implement type checking by
recursive traversal of the AST nodes:
Deriving a Judgment
• Consider the judgment
– if (b) 2 else 3 : int
• What do we need to decide that this is a well-typed expression
of type int?
– b must be a bool (b : bool); 2 must be an int (2 : int);
– 3 must be an int (3 : int)
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 51
20
Static Semantics and Type Judgments
Deriving a Judgment
• To show
– b: bool, x: int
⊥
• Need to show
– b: bool, x: int
⊥⊥ ⊥
b : bool
– b: bool, x: int 2 : int
– b: bool, x: int x : int
21
Assignment Statements
⊥ ⊥ ⊥
id : T A A E3 : T
A E:T A E2 : int
⊥
A E1 : array[T]
A ⊥ id = E : T
A E1[E2] = E3 : T (array-assign)
⊥
(variable-assign)
Sequence Statements
• Rule: A sequence of statements is well-typed if the first statement
is well-typed, and the remaining are well-typed as well:
A S1 : T1
⊥ ⊥
If Statements
A E : bool
⊥ ⊥
A S1 : T A S2 : T
⊥
(if-then-else)
A if (E) S1 else S2 : T
⊥
A E : bool
⊥ ⊥
A S:T
(if-then)
A if (E) S : unit
⊥
22
Declarations
= unit if no E
A id : T [ = E ] : T1
⊥
A, id : T (S2; .... ; Sn) : Tn
⊥
(declaration)
A (id : T [ = E ]; S2; .... ; Sn) : Tn
⊥
Declarations add entries to the environment (e.g., the symbol table)
Function Calls
• If expression E is a function value, it has a type
T1 x T2 x ... x Tn → Tr
• Ti are argument types, where I = 1,2,…,n; and Tr is the return type
• How to type-check a function call? E(E1, ..., En)
A E : T1 x T2 x ... Tn → Tr
⊥ ⊥
A Ei : Ti (i 1 ... n)
(function-call)
⊥
Function Declarations
• Consider a function declaration of the form:
– Tr fun (T1 a1, ... , Tn an) = E
– Equivalent to:
• Tr fun (T1 a1, ..., Tn an) {return E;}
• Type of function body S must match declared return type of
function, i.e., E : Tr
• But, in what type context?
23
Mutual Recursion
• Example
– int f(int x) = g(x) + 1;
– int g(int x) = f(x) – 1;
• Two-pass approach:
– Scan top level of AST picking up all function signatures
and creating an environment binding all global identifiers
– Type-check each function individually using this global
environment
A E:T return_fun : T A
⊥
(return)
A return E : unit
⊥
24
Static Semantics Summary
• Scope errors
– Variables not defined
– Multiple declarations
• Type errors
– Assignment of values of different types
– Invocation of functions with different number of parameters or
parameters of incorrect type
– Incorrect use of return statements
25
Where We Are...
Source code
(character stream) Lexical Analysis regular expressions
token stream
Syntax Analysis grammars
abstract syntax
tree
Semantic Analysis static semantics
abstract syntax
tree + symbol
tables, types Intermediate Code Gen
Intermediate code
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 73
26