Chapter 4 - Compiler Designnn 1 Compressed

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Chapter four

Semantic Analysis
▪ Semantic analyzer takes the output of syntax
analyzer and produces another tree.
▪ Similarly, intermediate code generator takes a
tree as an input produced by semantic
analyzer and produces intermediate code.
Semantic Analyzer
Semantic Analysis cont’d
Syntax tree is a compressed representation of
the parse tree (a hierarchical structure that
represents the derivation of the grammar to
obtain input strings) in which the operators
appear as interior nodes and the operands of the
operator are the children of the node for that
operator.
▪ Example of syntax tree
Semantic Analyzer
Semantic analysis is the third phase of compiler.
▪ It is the task of ensuring that the declarations and
statements of a program are semantically correct, i.e,
that their meaning is clear and consistent with the
way in which control structures and data types are
supposed to be used.
▪ It checks for the semantic consistency.
▪ Type information is gathered and stored in symbol
table or in syntax tree.
▪ Performs type checking.
▪ It verifies the parse tree, whether it’s meaningful or
not. It furthermore produces a verified parse tree.
Syntax vs. Semantics:
syntax concerns the form of a valid program (described
conveniently by a context-free grammar CFG)
semantics concerns its meaning: rules that go beyond
mere form (e.g., the number of arguments contained in a
call to a subroutine matches the number of formal
parameters in the subroutine definition – cannot be counted
using CFG, type consistency):
Defines what the program means
Detects if the program is correct
Helps to translate it into another representation
Note : Syntax — determines valid form of program or
Determines meaning of program
-– Enforces semantic rules
• Semantics — behavior of valid program
Role of Semantic Analyzer
Semantic rules are divided into:
static semantics enforced at compile time
Examples of static analysis:
Escape analysis determines when all references to a
value will be confined to a given context, allowing it to be
allocated on the stack instead of the heap, or to be
accessed without locks.
Subtype analysis determines when a variable in an
object-oriented language is guaranteed to have a certain
subtype, so that its methods can be called without dynamic
dispatch.
Principle of static type checking

Identify the types of the language and the language constructs that have types
associated with them Associate a type attribute to these constructs and
semantic rules to compute them and to check that the typing system is
respected Needs to store identifier types in the symbol table:
➢ One can use two separate tables,
➢ one for the variable names and one for the function names Function types is
determined by the types (and number) of arguments and return type. E.g.,
(int, int) ! int Type checking can not be dissociated from scope and other
semantic checking
Cont….
Type checking, for example, is static and precise
in ML: the compiler ensures that no variable will
ever be used at run time in a way that is
inappropriate for its type
By contrast, languages like Lisp and Smalltalk
accept the run-time overhead of dynamic type
checks
In Java, type checking is mostly static, but
dynamically loaded classes and type casts require
run-time checks
Role of Semantic Analyzer
dynamic semantics: the compiler generates code to
enforce dynamic semantic rules at run time (or calls
libraries to do it) (for errors like division by zero, out-of-
bounds index in array) and the Following parsing, the next
two phases of the "typical" compiler are:
semantic analysis
(intermediate) code generation
The principal job of the semantic analyzer is to enforce
static semantic rules, plus:
constructs a syntax tree
information gathered is needed by the code generator
1. Syntax-directed translation

1. Syntax-directed translation
A general way to associate actions (i.e., programs) to
production rules of a context-free grammar Used for
carrying out most semantic analyses as well as code
translation.
It is also refer to a method of compiler implementation
where the source language translation is completely driven
by the parser, i.e., based on the syntax of the language.
The parsing process and parse trees are used to direct
semantic analysis and the translation of the source
program. Almost all modern compilers are syntax-directed.
Cont………………….

The general approach to Syntax-Directed Translation is to


construct a parse tree or syntax tree and compute the
values of attributes at the nodes of the tree.

In many cases, translation can be done during parsing


without building an explicit tree. A class of syntax-directed
translations called "L-attributed translations" (L for left-
toright) includes almost all translations that can be
performed during parsing. Similarly, "Sattributed
translations" (S for synthesized) can be performed easily in
connection with a bottom-up parse
Cont………………….

There are two ways to represent the semantic rules


associated with grammar symbols.
1. Syntax-Directed Definitions (SDD)
2. Syntax-Directed Translation Schemes (SDT)
1.A syntax-directed definition (SDD) is a context-free grammar together
with attributes and rules. Attributes are associated with grammar symbols
and rules are associated with productions. An attribute has a name and an
associated value: a string, a number, a type, a memory location, an
assigned register, strings. The strings may even be long sequences of
code, say code in the intermediate language used by a compiler. If X is a
symbol and a is one of its attributes, then we write X.a to denote the value
of a at a particular parse-tree node labeled X. If we implement the nodes of
the parse tree by records or objects, then the attributes of X can be
implemented by data fields in the records that represent the nodes for X.
The attributes are evaluated by the semantic rules attached to the
productions.
Applications of SDD’s

SDD can be used at several places during compilation:


Building the syntax tree from the parse tree.
Various static semantic checking (type, scope, etc.) Code
generation Building an interpreter
2.Syntax-Directed Translation Schemes (SDT): SDT
embeds program fragments called semantic actions within
production bodies. The position of semantic action in a
production body determines the order in which the action is
executed. Sometimes we want to control the way the
attributes are evaluated, the order and place where they
are evaluated. This is of a slightly lower level. To avoid
repeated traversal of the parse tree, actions are taken
simultaneously when a token is found. So calculation of
attributes goes along with the construction of the parse
tree. Along with the evaluation of the semantic rules the
compiler may simultaneously generate code, save the
information in the symbol table, and/or issue error
messages etc. at the same time while building the parse
tree. This saves multiple passes of the parse tree
Parse tree and the dependence graph:
Cont…

Dependence graph shows the dependence of attributes on


other attributes, along with the syntax tree. Top down
traversal is followed by a bottom up traversal to resolve the
dependencies. Number, val and neg are synthesized
attributes. Pos is an inherited attribute. Example: In the rule
E E1 + T {print ‘+’}, the action is positioned after the body
of the production. SDTs are more efficient than SDDs as
they indicate the order of evaluation of semantic actions
associated with a production rule.
Attributes grammar
➢ An attribute grammar is a formal way to define attributes for
the productions of a formal grammar, associating these
attributes to values.
➢ The evaluation occurs in the nodes of the abstract syntax
tree, when the language is processed by some parser or
compiler. It is defined as augmenting the conventional
grammar with information to control semantic analysis and
translation.
➢ Attribute grammars can also be used to translate the
syntax tree directly into code for some specific machine, or
into some intermediate language.
➢ One strength of attribute grammars is that they can
transport information from anywhere in the abstract syntax
tree to anywhere else, in a controlled and formal way.
S-Attributed Grammars
S-Attributed Grammars are a class of attribute grammars characterized by
having no inherited attributes, which must be passed down from parent
nodes to children nodes of the abstract syntax tree during the semantic
analysis of the parsing process, is a problem for bottom-up parsing
because in bottom-up parsing, the parent nodes of the abstract syntax tree
are created after creation of all of their children.
L-Attributed Grammars
➢ L-attributed grammars are a special type of attribute grammars. They
allow the attributes to be evaluated in one left-to-right traversal of the
abstract syntax tree.
➢ As a result, attribute evaluation in L-attributed grammars can be
incorporated conveniently in top-down parsing.
➢ Many programming languages are L-attributed. Special types of
compilers, the narrow compilers, are based on some form of L-
attributed grammar..
➢ It is used for code synthesis.
Attributes
Two kinds of attributes
Synthesized: Attribute value for the LHS nonterminal is computed from the
attribute values of the symbols at the RHS of the rule.
➢ value only computed when symbol is on left-hand side of production –
Attributes can be computed independently of context – S-attributed
grammar has only synthesized attributes
➢ A synthesized attribute for a nonterminal A at a parse-tree node N is
defined by a semantic rule associated with the production at N.
➢ A synthesized attribute at node N is defined only in terms of attribute
values at the children of N and at N itself. Synthesized attributes are
evaluated in bottom up fashion.
. Let us consider the following Contextfree grammar which can
describe a language made up of multiplication and addition of
integers:
Expr → Expr + Term
Expr → Term
Term → Term * Factor
Term → Factor
Factor → "(" Expr ")"
Factor → integer
The following attribute grammar can be used to calculate the
result of an expression written in the grammar which only uses
synthesized values, and is therefore an S-attributed grammar.
.
Inherited: Attribute value of a RHS nonterminal is computed from the
attribute values of the LHS nonterminal and some other RHS non terminals.
Terminals can have synthesized attributes, computed by the lexer (e.g.,
id.lexeme), but no inherited attributes. OR
➢ value computed in productions where symbol is on right-hand side –
Attributes computed using context.
An inherited attribute for a nonterminal B at a parse-tree node N is
defined by a semantic rule associated with the production at the
. parent of N.
An inherited attribute at node N is defined only in terms of
attribute values at N's parent, N itself, and N's siblings.
Inherited attributes are evaluated in top down fashion.
An inherited attribute at node N cannot be defined in terms of
attribute values at the children of node N.
However, a synthesized attribute at node N can be defined in
terms of inherited attribute values at node N itself.
An inherited attribute at a node in parse tree is defined using the
attribute values at the parent or siblings.
Inherited attributes are convenient for expressing the
dependence of a programming language construct on the context
in which it appears.
For example, we can use an inherited attribute to keep track of
whether an identifier appears on the left or the right side of an
assignment in order to decide whether the address or the value of
the identifier is needed
.
The most common types of attributes that we may wish to
note for each symbol are:
• Type – Associates the data object with the allowable set
of values.
• Location – may be changed by the memory management
routine of the operating system.
• Value – usually the result of an assignment operation.
• Name – can be changed as a result of subprogram calls
and returns
• Component – data objects may be composed of several
data objects.
This binding may be represented by a pointer and
subsequently changed
Definition of an Attribute Grammar
An attribute grammar is defined as a grammar with the
following added features:
• Each symbol X has a set of attributes A(X).
• A(X) can be: – extrinsic attributes, which are obtained
from outside the grammar, mostly notably the symbol table.
– synthesized attributes, which are passed up the parse
tree – inherited attributes which are passed down the parse
tree
• Each production of the grammar has a set of semantic
functions and a set of predicate functions (which may be an
empty set).
Abstract syntax tree (AST)
The abstract syntax tree is used as a basis for most
semantic analyses and for intermediate code generation (or
even used as an intermediate representation)
When the grammar has been modified for parsing, the
syntax tree is a more natural representation than the parse
tree.
The abstract syntax tree can be constructed using SDD
and Another SDD can then be defined on the syntax tree to
perform semantic checking or generate another
intermediate code (directed by the syntax tree and not the
parse tree)
Analysis of Abstract Syntax Trees
• Common for parser to generate AST for analysis
• Describe structure of AST as tree grammar
• Form attribute grammar from tree grammar instead of
CFG
• Allows analysis of AST
Abstract syntax tree (AST)
Intermediate Code Generation:-
The intermediate code generation uses the structure produced by
the syntax analyzer to create a stream of simple instructions. Many
styles of intermediate code are possible. One common style uses
instruction with one operator and a small number of operands. The
output of the syntax analyzer is some representation of a parse
tree. the intermediate code generation phase transforms this parse
tree into an intermediate language representation of the source
program.

Code Optimization
This is optional phase described to improve the intermediate code
so that the output runs faster and takes less space. Its output is
another intermediate code program that does the some job as the
original, but in a way that saves time and / or spaces.
Code Generation:- The last phase of translation is code
generation. A number of optimizations to reduce the length of
machine language program are carried out during this phase. The
output of the code generator is the machine language program of
the specified computer
Cont…..
Table Management OR Book-keeping :- A compiler
needs to collect information about all the data objects
that appear in the source program. The information
about data objects is collected by the early phases of
the compiler-lexical and syntactic analyzers. The data
structure used to record this information is called as
Symbol Table
Example:
thank you

You might also like