Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

EEX6335 – Compiler Design

EEX6363 – Compiler Construction

Day School – 4
by
Gehan Anthonys

Bachelor of Technology Honours in Engineering


Bachelor of Software Engineering Honours

Department of Electrical and Computer Engineering


Faculty of Engineering
The Open University of Sri Lanka

Lets consider detail analysis of semantic analyzer:

Semantic Analysis
• Categories
• Identifiers
• Symbol Table
• Rules

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 5

1
Phases of Compilation

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 6

Semantic Analysis: the problem

Where we are?
So far we were able to check:
• the program includes correct “words”;
• “words” are combined in correct “sentences”;

Why Semantic Analysis?

• Lexically and syntactically correct programs may still


contain other errors;

• Lexical and syntax analyses are not power enough to


ensure the correct usage of variables, objects, functions, ...

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 7

2
What’s next?
We would like to:
• perform additional checks to increase guarantees of correctness;
• transform the program from the source language into the target
one, and according to precisely defined semantic rules;

Additional Checks:

There are many additional checks that can be performed to increase


correctness of code:
• Coherent (clear idea) usage of variables
➢ definition-usage;
➢ type;
• Existence of unreachable code blocks, . . .

Here, it is mainly focused on the mechanisms for type checking and


generation of intermediate code

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 8

Semantic analysis:
• Ensure that the program satisfies a set of rules regarding the
usage of programming constructs (variables, objects,
expressions, statements & etc.,)

• The principal job of the semantic analyzer is to enforce static


semantic rules.

• In general, anything that requires the compiler to compare


things that are separate by a long distance or to count things
ends up being a matter of semantics.

• The semantic analyzer also commonly constructs a syntax tree


(usually first), and much of the information it gathers is needed
by the code generator.

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 9

3
Syntax Directed Definitions:

• Attributes are used to associate characteristics and store


values associated to grammar symbols.

• A syntax directed definition (SDD) provides the semantic


rules to permit the definition of the values for the attributes.

➢ attributes are associated to grammar symbols and can


be of any kind;
➢ rules are associated to productions.

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 10

An SDD can be defined using two different kinds of attributes:

• Synthesized attributes: a synthesized attributes at


node N is defined only in terms of attribute values at
the children of N and at N itself;
➢ can be evaluated during a single bottom-up traversal of parse
tree;
➢ the production must have non-terminal as its head.
➢ can be contained by both the terminals or non-terminals.

• Inherited attributes: an inherited attribute at node N is


defined only in terms of attribute values at N’s parent,
N itself, and N’s siblings;
➢ can be evaluated during a single top-down and sideways
traversal of parse tree.
➢ the production must have non-terminal as a symbol in its body.
➢ can’t be contained by both, it is only contained by non-terminals.

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 11

4
Attribute Grammars
• Context-Free Grammars (CFGs) are used to specify the syntax
of programming languages.
E.g. arithmetic expressions

How do we tie these rules to mathematical concepts?


• Attribute grammars are annotated CFGs in which annotations
are used to establish meaning relationships among symbols
– Annotations are also known as decorations

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 12

• Each grammar symbols has a set of attributes


E.g. the value of E1 is the attribute E1.val

• Each grammar rule has a set of rules over the symbol attributes
-- Copy rules, Semantic Function rules
E.g. sum, quotient

Example

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 13

5
Attribute Flow
Example
• The figure shows the result of
annotating the parse tree for (1+3)*2

• Each symbols has at most one


attribute shown in the corresponding
box
– Numerical value in this example
– Operator symbols have no value

• Arrows represent attribute flow

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 14

Attribute Flow

Example

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 15

6
Categories of Semantic Analysis

• Examples of semantic rules


– Variables must be defined before being used
– A variable should not be defined multiple times
– In an assignment statement, the variable and the expression
must have the same type
– The test expression of an if statement must have Boolean
type

• Two major categories


– Semantic rules regarding types
– Semantic rules regarding scopes

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 16

Type information and Type checking

• Type Information:
Describes what kind of values correspond to different
constructs: variables, statements, expressions, functions, etc.
– variables: int a; integer
– expressions: (a+1) == 2 boolean
– statements: a = 1.0; floating-point
– functions: int pow(int n, int m) int = int, int

• Type Checking:
Set of rules which ensures the type consistency of different
constructs in the program

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 17

7
Scope Information

• Characterizes the declaration of identifiers and the portions


of the program where it is allowed to use each identifier
E.g., identifiers: variables, functions, objects, labels

• Lexical scope: textual region in the program


E.g.,
Statement block, formal argument list, object body,
function or method body, source file, whole program

• Scope of an identifier:
The lexical scope its declaration refers to.

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 18

Variable Scope

• Scope of variables in statement blocks:

{ int a;
... scope of variable a
{int b;
... scope of variable b
}
....
}

• Scope of global variables: current file


• Scope of external variables: whole program

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 19

8
Function Parameter and Label Scope
• Scope of formal arguments of functions:

int foo(int n) {
...
} scope of argument n

• Scope of labels:
void foo() {
... goto lab;
...
lab: i++;
... goto lab; scope of label lab,
...
} Note: in C, all labels have function
scope regardless of where they are

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 20

Scope in Class declaration

• Scope of object fields and methods:

class A {
public:
void f() {x=1;} scope of variable x
... and method f
private:
int x;
...
}

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 21

9
Semantic Rules for Scopes
• Main rules regarding scopes:
– Rule 1: Use each identifier only within its scope
– Rule 2: Do not declare identifier of the same kind with
identical names more than once in the same lexical scope

int X(int X) {
class X { int X;
int X; goto X;
void X(int X) { {
X: ... int X;
goto X; X: X = 1;
} } Both are legal but NOT
} } recommended!

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 22

Symbol Tables

• Semantic checks refer to properties of identifiers in the


program – their scope or type

• Need an environment to store the information about


identifiers = symbol table

• Each entry in the symbol table contains:


– Name of an identifier
– Additional info about identifier: kind, type, constant?

NAME KIND TYPE ATTRIBUTES


foo func int,int → int extern
m arg int
n arg int const
tmp var char const

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 23

10
Scope Information
• How to capture the scope information in the symbol table?

• Idea:
– There is a hierarchy of scopes in the program
– Use similar hierarchy of symbol tables
– One symbol table for each scope
– Each symbol table contains the symbols declared in that
lexical scope

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 24

Example
Global symtab
int x; x var int
f func int → void
void f(int m) { g func int → int
float x, y; func g
func f
... symtab
symtab
{int i, j; ....; }
{int x; l: ...; } m arg int n arg int
} x var float t var char
y var float
int g(int n) {
char t;
... ;
} i var int x var int
j var int l label

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 25

11
Problem
Associate each
definition of x Global symtab
with its appropriate
x var int
symbol table entry
int x; f func int → void
void f(int m) { g func int → int
float x, y;
...
{int i, j; x=1; }
{int x; l: x=2; } m arg int n arg int
} x var float t var char
y var float
int g(int n) {
char t;
x=3;
} i var int x var int
j var int l label

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 27

Catching Semantic Errors

Error!
undefined Global symtab
int x;
variable x var int
void f(int m) {
float x, y; f func int → void
... g func int → int
{int i, j; x=1; }
{int x; l: i=2; }
} m arg int n arg int
x var float t var char
int g(int n) { y var float
char t;
x=3;
}
i var int x var int
j var int l label
i=2

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 28

12
Symbol Table Operations

• Two operations:
– To build symbol tables, we need to insert new identifiers
in the table
– In the subsequent stages of the compiler we need to
access the information from the table: use lookup
function

• Cannot build symbol tables during lexical analysis


– Hierarchy of scopes encoded in syntax

• Build the symbol tables:


– While parsing, using the semantic actions
– After the AST is constructed

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 29

List Implementation

• Simple implementation using a list


– One cell per entry in the table
– Can grow dynamically during compilation

. . . .
foo m n tmp
func var var var
int,int int int char
→ int

• Disadvantage: inefficient for large tables


– Need to scan half the list on average

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 30

13
Hash table implementation
• Efficient implementation using hash table
– Array of lists (buckets)
– Use a hash on symbol name to map to corresponding bucket
• Hash func: identifier name (string) → int
• Note: include identifier type in match function

Forward References
• Use of an identifier within the scope of its declaration, but
before it is declared
• Any compiler phase that uses the information
from the symbol table must be performed E.g.,
after the table is constructed
class A {
• Cannot type-check and build symbol table int m() {return n(); }
at the same time int n() {return 1; }
}
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 31

Type Checking

• Semantic checks to enforce the type safety of the program

• Examples
– Unary and binary operators (e.g. +, ==, [ ]) must receive
operands of the proper type
– Functions must be invoked with the right number and type of
arguments
– Return statements must agree with the return type
– In assignments, assigned value must be compatible with type of
variable on LHS
– Class members accessed appropriately

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 34

14
4 Concepts related to Types/Languages
1. Static vs dynamic checking -- When to check types
2. Static vs dynamic typing -- When to define types
3. Strong vs weak typing -- How many type errors
4. Sound type systems -- Statically catch all type errors

Static vs Dynamic Checking


• Static type checking
– Perform at compile time
• Dynamic type checking
– Perform at run time (as the program executes)
• Examples of dynamic checking
– Array bounds checking
– Null pointer dereferences

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 35

Static vs Dynamic Typing


Static and dynamic typing refer to type definitions
(i.e., bindings of types to variables, expressions, etc.)
• Static typed language
– Types defined at compile-time and do not change during
the execution of the program
• C, C++, Java, Pascal
• Dynamically typed language
– Types defined at run-time, as program executes
• Lisp, Smalltalk

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 36

15
Strong vs Weak Typing
• Refer to how much type consistency is enforced
• Strongly typed languages -- Guarantee accepted programs are type-safe
• Weakly typed languages -- Allow programs which contain type errors
• These concepts refer to run-time -- Can achieve strong typing using
either static or dynamic typing

Soundness
• Sound type systems: can statically ensure that the program is
type-safe
• Soundness implies strong typing
• Static type safety requires a conservative approximation of the
values that may occur during all possible executions
– May reject type-safe programs
– Need to be expressive: reject as few type-safe programs as possible

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 37

Type Systems
• Type is predicate on a value
• Type expressions: Describe the possible types in the program
– E.g., int, char*, array[], object, etc.
• Type system: Defines types for language constructs
– E.g., expressions, statements

Type Expressions
• Language type systems have basic types (aka: primitive types
or ground types) -- E.g., int, char*, double
• Build type expressions using basic types:
– Type constructors
• Array types; Structure types; Pointer types
– Type aliases
– Function types
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 40

16
Type Expressions: Arrays

• Various kinds of array types in different programming languages


• Array(T): arrays without bounds
– C, Java: T[ ]
• Array(T,S): array with size
– C: T[S], may be indexed 0 .. S-1
• Array(T,L,U): array with upper/lower bounds
– Pascal: array[L .. U] of T
• Array(T, S1, …, Sn): multi-dimensional arrays
– Fortran: T(L1, …, Ln)

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 41

Type Expressions: Structures / Functions

• Structures
– Has form {id1: T1, … , idn: Tn} for some identifiers idi and
types Ti
– Is essentially cartesian product: (id1 x T1) x … x (idn x Tn)
– Supports access operations on each field, with
corresponding type
– Objects: extension of structure types

• Functions
– Type: T1 x T2 x … x Tn → Tr
– Function value can be invoked with some argument
expressions with types Ti, returns type Tr

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 42

17
Type Expressions: Aliases / Pointers

• Type aliases
– C: typedef int int_array[ ];
– Aliases are not type constructors
• Int_array is the same type as int [ ]
– Problem: Different type expressions may denote the same
type

• Pointers
– Pointer types characterize values that are addresses of
variables of other types
– C pointers: T* e.g., int *x;

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 43

Implementation

• Use a separate class hierarchy for types:


– class BaseType extends Type {String name;}
– class IntType extends BaseType { ... }
– class FloatType extends BaseType { ... }
– class ArrayType extends BaseType { ... }
– class FunctionType extends BaseType { ... }

• Semantic analysis translates all type expressions to type objects

• Symbol table binds name to type object

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 44

18
Creating Type Objects

• Build types while parsing – use a syntax-directed definition


non terminal Type type
type : INTEGER
{$$ = new IntType(id); }
| ARRAY LBRACKET type RBRACKET
{$$ = new ArrayType($3); } ;

• Type objects are the abstract syntax tree (AST) nodes for type
expressions

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 46

Type Checking

• Type checking is verify typing rules


– E.g., “Operands of + must be integer expressions; the result is
an integer expression”

• Option 1: Implement using syntax-directed definitions (type-check


during the parsing)

expr: expr PLUS expr {


if ($1 == IntType && $3 == IntType)
$$ = IntType
else
TypeCheckError(“+”);
}

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 48

19
Option 2: First build the AST, then implement type checking by
recursive traversal of the AST nodes:

class Add extends Expr {


Type typeCheck() {
Type t1 = e1.typeCheck(), t2 = e2.typeCheck();
if (t1 == Int && t2 == Int) return Int
else TypeCheckError(“+”);
}
}

Type Checking Identifiers


• Identifier expressions: Lookup the type in the symbol table

class IdExpr extends Expr {


Identifier id;
Type typeCheck() {return id.lookupType(); }
}
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 49

Type judgments for statements


• Statements may be expressions (i.e., represent values)
• Use type judgments for statements:
-- if (b) 2 else 3 : int
-- x == 10 : bool
-- b = true, y = 2 : int
• For statements which are not expressions: use a special unit type
(void or empty type)
– S : unit means “S is a well-typed statement with no result type”

Deriving a Judgment
• Consider the judgment
– if (b) 2 else 3 : int
• What do we need to decide that this is a well-typed expression
of type int?
– b must be a bool (b : bool); 2 must be an int (2 : int);
– 3 must be an int (3 : int)
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 51

20
Static Semantics and Type Judgments

• Static semantics is the formal notation which describes type


judgments:
– E:T
– means “E is a well-typed expression of type T”

• Type judgment examples:


-- 2 : int
-- true : bool
-- 2 * (3 + 4) : int
-- “Hello” : string

• Type judgment notation: A E:T ⊥


– Means “In the context A, the expression E is a well-typed
expression with type T”

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 53

• Type context is a set of type bindings: id : T


(i.e. type context = symbol table)
⊥ ⊥⊥

b: bool, x: int b: bool


b: bool, x: int if (b) 2 else x : int
2 + 2 : int

Deriving a Judgment
• To show
– b: bool, x: int

if (b) 2 else x : int

• Need to show
– b: bool, x: int
⊥⊥ ⊥

b : bool
– b: bool, x: int 2 : int
– b: bool, x: int x : int

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 54

21
Assignment Statements

⊥ ⊥ ⊥
id : T  A A E3 : T
A E:T A E2 : int


A E1 : array[T]
A ⊥ id = E : T
A E1[E2] = E3 : T (array-assign)


(variable-assign)

Sequence Statements
• Rule: A sequence of statements is well-typed if the first statement
is well-typed, and the remaining are well-typed as well:

A S1 : T1
⊥ ⊥

A (S2; .... ; Sn) : Tn


(sequence)
A (S1; S2; .... ; Sn) : Tn

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 61

If Statements

• If statement as an expression: its value is the value of the clause that is


executed

A E : bool
⊥ ⊥

A S1 : T A S2 : T

(if-then-else)
A if (E) S1 else S2 : T

• If with no else clause, no value, why??

A E : bool
⊥ ⊥

A S:T
(if-then)
A if (E) S : unit

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 62

22
Declarations
= unit if no E
A id : T [ = E ] : T1


A, id : T (S2; .... ; Sn) : Tn


(declaration)
A (id : T [ = E ]; S2; .... ; Sn) : Tn

Declarations add entries to the environment (e.g., the symbol table)

Function Calls
• If expression E is a function value, it has a type
T1 x T2 x ... x Tn → Tr
• Ti are argument types, where I = 1,2,…,n; and Tr is the return type
• How to type-check a function call? E(E1, ..., En)

A E : T1 x T2 x ... Tn → Tr
⊥ ⊥

A Ei : Ti (i  1 ... n)
(function-call)

A E(E1, ..., En) : Tr


21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 64

Function Declarations
• Consider a function declaration of the form:
– Tr fun (T1 a1, ... , Tn an) = E
– Equivalent to:
• Tr fun (T1 a1, ..., Tn an) {return E;}
• Type of function body S must match declared return type of
function, i.e., E : Tr
• But, in what type context?

Add arguments to environment


• Let A be the context surrounding the function declaration.
– The function declaration: Tr fun (T1 a1, ... , Tn an) = E
– Is well-formed if
A, a1 : T1 , ... , an : Tn E : Tr

• What about recursion?


– Need: fun: T1 x T2 x ... x Tn → Tr  A
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 65

23
Mutual Recursion

• Example
– int f(int x) = g(x) + 1;
– int g(int x) = f(x) – 1;

• Need environment containing at least


• f: int → int, g: int → int
• when checking both f and g

• Two-pass approach:
– Scan top level of AST picking up all function signatures
and creating an environment binding all global identifiers
– Type-check each function individually using this global
environment

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 67

How to check return?


A E:T
⊥⊥

• A return statement produces no value


for its containing context to use A return E : unit
(return)
• Does not return control to containing context
• Suppose we use type unit ...
– Then how to make sure the return type of the current function is T?

Put return in the Symbol table:


• Add a special entry {return_fun : T} when we start checking the
function “fun”, look up this entry when we hit a return statement
• To check Tr fun (T1 a1, ... , Tn an) { S } in environment A,
need to check:
A, a1 : T1, ..., an : Tn, return_fun : Tr A : Tr

A E:T return_fun : T  A

(return)
A return E : unit

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 68

24
Static Semantics Summary

• Static semantics = formal specification of type-checking rules


• Concise form of static semantics: typing rules expressed as
inference rules
• Expression and statements are well-formed (or well-typed) if
a typing derivation (proof tree) can be constructed using the
inference rules

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 69

Review of Semantic Analysis


• Check errors not detected by lexical or syntax analysis

• Scope errors
– Variables not defined
– Multiple declarations

• Type errors
– Assignment of values of different types
– Invocation of functions with different number of parameters or
parameters of incorrect type
– Incorrect use of return statements

• The most common way of specifying the semantics of a language


is plain English

• There is a lack of formal rigor in the semantic specification of


programming languages -- guess why

21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 70

25
Where We Are...

Source code
(character stream) Lexical Analysis regular expressions
token stream
Syntax Analysis grammars
abstract syntax
tree
Semantic Analysis static semantics
abstract syntax
tree + symbol
tables, types Intermediate Code Gen

Intermediate code
21-Jul-21 EEX6335/ EEX6363 -- Compiler Design/ Compiler Construction 73

26

You might also like