Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

Chapter 3

Describing Syntax and


Semantics
Chapter 3 Topics

• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Attribute Grammars
• Describing the Meanings of Programs: Dynamic
Semantics

Copyright © 2012 Addison-Wesley. All rights reserved. 1-2


INTRODUCTION

Copyright © 2012 Pearson Education. All rights reserved. 1-3


Introduction

• A programming language’s success relies on the description of it. It must


be concise yet understandable.

• One of the main problems that's involved in describing a language, is how


diverse the audience you must attend for: (also known as users of language
definition)
– initial evaluators
– Implementors (compiler writers)
– Users (programmers)

• Syntax and semantics is what makes a programming language. They


provide the language definition.

Copyright © 2012 Pearson Education. All rights reserved. 1-4


Syntax

• Syntax of programming languages is the form of its expressions,


statements, and program units.
• For example, in Java while and if statements are written as:
while (Boolean_expression) {
. . .
}

Copyright © 2012 Pearson Education. All rights reserved. 1-5


Syntax

if (condition) {
. . .
} else {
. . .
}

• This is the syntax of java, in other words, we can say syntax is rules of grammar
and spelling of the language.

• If the writing doesn’t abide with the grammar or spelling, we get an error (invalid
syntax).

Copyright © 2012 Pearson Education. All rights reserved. 1-6


Semantics

• Semantics is the meaning of those expressions, statements, and


program units.
if (condition) {
. . .
} else {
. . .
}
• The semantics of the if statement form is that when the condition is
– true, the expression surrounded by if is executed.
– if it's false, it the else block will be executed.

Copyright © 2012 Pearson Education. All rights reserved. 1-7


Syntax and Semantics

• As we saw from our previous example, syntax and semantics are


closely related. In a well-designed programming language, semantics
should follow directly from syntax.

• Is it easier to create syntax than semantics?


– Yes, because there are universally accepted notations for syntax.

Copyright © 2012 Pearson Education. All rights reserved. 1-8


THE GENERAL
PROBLEM OF
DESCRIBING SYNTAX

Copyright © 2012 Pearson Education. All rights reserved. 1-9


The General Problem of Describing
Syntax: Terminology

• A sentence (or statement) is a string of characters


over some alphabet

• A language is a set of sentences

• A lexeme is the lowest level syntactic unit of a


language (e.g., *, sum, begin)
– Lexemes can include numeric literals, operators, and
special words.
– Programs can be thought of as strings of lexemes.

Copyright © 2012 Addison-Wesley. All rights reserved. 1-10


The General Problem of Describing
Syntax: Terminology

• A token is a category of lexemes (e.g.,


identifier)
– an identifier is a token that can have lexemes,
or instances, like sum and total.

– A token can have a single lexeme. Just like


arithmetic operations.
• + has just one possible lexeme.

Copyright © 2012 Addison-Wesley. All rights reserved. 1-11


Exercise

• Consider the following Java Statement:


index = 2 * count + 17
Lexemes Tokens
index identifier
= equal_sign
2 int_literal
* mult_op
count identifier
+ plus_op
17 int_literal
; semicolon

Copyright © 2012 Pearson Education. All rights reserved. 1-12


Formal Definition of Languages

• Languages can be defined in two ways,


Recognition and Generation.

• Language Recognizers:
– A recognizer reads input strings over the alphabet of
the language and decides whether the input string
belongs to the language.

– Example: The syntax analysis part of a compiler is a


recognizer for the language that the compiler
translates.
• The structure of the syntax analysers (parsers) will be
discussed in Chapter 4

1-13
Formal Definition of Languages

• Language Generators:
– Language generator is a device that can be used to
generate the sentences of a language.

– It can be determined whether the syntax of a particular


statement is correct by comparing it with the structure of
the generator.
– Examples are: Grammar and BNF

• Recognition and generation are useful for different


things, but are closely related.

Copyright © 2012 Pearson Education. All rights reserved. 1-14


BACKUS-NAUR FORM
AND CONTEXT-FREE
GRAMMARS

Copyright © 2012 Pearson Education. All rights reserved. 1-15


Context-Free Grammar

• Developed by linguist Noam Chomsky in the mid-1950s

• Defined four classes of generative devices or grammars.


– Two of these grammars are called context-free and regular
– Regular grammars are used to describe the tokens of programming languages.
– Context-free grammars can be used to describe syntax of the whole
programming language (with minor exceptions).

Copyright © 2012 Pearson Education. All rights reserved. 1-16


Backus-Naur Form

• John Backus introduces a new formal notation for


specifying programming language syntax.
– This was later modified slightly by Peter Naur.
– The revised method of syntax description became known
as Backus-Naur Form (BNF).
– From that ALGOL60 was created. Which became most
popular method of describing programming language
syntax

• BNF was almost identical to context-free grammar.


– BNF is a natural notation for describing syntax
– Further in this chapter, context-free grammars will be
referred to as grammars. Also, the terms BNF and
grammar can be used interchangeably.
1-17
Fundamentals

• Metalanguage: a language that is used to describe another language


– BNF is a metalanguage for programming languages.

• BNF uses abstractions for syntactic structures, which act like


variables.
– Example: Java assignment can be represented by the abstraction <assign>
<assign> → <var> = <expression>
Left-hand side (LHS) is the abstraction being defined.
Right-hand side (RHS) of the arrow consist of tokens, lexemes, and references to
other abstractions.

Copyright © 2012 Pearson Education. All rights reserved. 1-18


Fundamentals
• Abstractions in BNF Description are called non-
terminal symbols (nonterminals) often enclosed in
angle brackets.
• lexemes and tokens of the rules are called terminal
symbols (terminals)
• A rule or production describes the structure of the
statement.
– Rules have LHS and RHS. Consisting of terminal and
nonterminal symbols.

<assign> → <var> = <expression>

– This example’s rule says that abstraction <assign> is an


instance of abstraction <var>, followed by lexeme =,
ending with instance of abstraction <expression> 1-19
Fundamentals
• An example of <assign> rule is:
total = subtotal1 + subtotal2
<var> = <expression>

• Grammar: is a finite non-empty set of rules.


• Nonterminals can have two or more distinct
definitions. Multiple definitions can be written as a
single rule, with the different definitions separated
by | (logical OR).
– This means that abstraction (nonterminals) can have more
than one RHS
<stmt> → <single_stmt>
| begin <stmt_list> end
1-20
Fundamentals

• Example of multiple definitions written as a single rule.


• Java if statements can be described with the rules:
(1) <if_stmt> → if ( <logic_expr> ) <stmt>
(2) <if_stmt> → if ( <logic_expr> ) <stmt> else <stmt>

or with the rule (combining (1) and (2)):


<if_stmt> → if ( <logic_expr> ) <stmt>
| if ( <logic_expr> ) <stmt> else <stmt>

• In these rules, <stmt> is either a single or a compound statement.

Copyright © 2012 Pearson Education. All rights reserved. 1-21


Fundamentals

• BNF is simple, yet powerful enough to describe nearly all the syntax
of programming languages.

• Particularly, it can describe:


– Lists of similar constructs.
– Order in which different constructs must appear
– Nested Structures to any depth
– Imply operator precedence
– Imply operator associativity

Copyright © 2012 Pearson Education. All rights reserved. 1-22


DESCRIBING LISTS

Copyright © 2012 Pearson Education. All rights reserved. 1-23


DESCRIBING LISTS

• In order to explain lists such as 1,2, . . . We must use an alternative


to ellipsis (. . .) as BNF doesn’t include it.
• The alternative would be to use recursion.
• A rule is recursive if its LHS appears in its RHS.
• Example:
<ident_list> → ident
| ident, <ident_list>
• Here <ident_list> can be defined as either single token (identifier) or an
identifier followed by another instance of <ident_list>.
• Recursion is used to describe lists in many grammars

1-24
3.3.1.5 Grammars and Derivations

A grammar is a generative device for defining


languages.

beginning with a special nonterminal


(Abstraction) of the grammar called the start
symbol.

The sentences of the language are generated


through a sequence of applications of the rules
called derivation

the start symbol represents a complete program


and is often named <program>

1-25
Example

Example 3.1
<program> → begin <stmt_list> end
<stmt_list> → <stmt>
| <stmt> ; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var>+<var> | <var>-<var> | <var>

Copyright © 2012 Pearson Education. All rights reserved. 1-26


Representing rules

• The two rules are equivalent

<stmt_list> → <stmt>
| <stmt> ; <stmt_list>
OR
<stmt_list> → <stmt>
<stmt_list> → <stmt> ; <stmt_list>

Copyright © 2012 Pearson Education. All rights reserved. 1-27


An Example Derivation
Generating sentences from the grammar is called derivation
Example:
<program> => begin <stmt_list> end
=> begin <stmt>; <stmt_list> end
=> begin <var> = <epression>; <stmt_list> end
=> begin A = <epression>; <stmt_list> end
=> begin A = <var> + <var>; <stmt_list> end
=> begin A = B + <var>; <stmt_list> end
=> begin A = B + C; <stmt_list> end
=> begin A = B + C; <stmt> end
=> begin A = B + C; <var> = <expression> end
=> begin A = B + C; B = <expression> end
=> begin A = B + C; B = <var> end
=> begin A = B + C; B = C end

opright © 2012 Pearson Education. All rights reserved. 1-28


Derivation
• A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence(all terminal symbols)
• The symbol => is read “derives”.

• Every string of symbols in the derivation, including


<program>, is a sentential form.

• A leftmost derivation is one in which the leftmost


nonterminal in each sentential form is the one that is
expanded. The derivation continues until the sentential
form contains no nonterminals.
• A derivation may be neither leftmost nor rightmost.

• The derivation continues until the sentential form


(String) does not contain any nonterminals. That
sentential form, consisting of only terminals, or
lexemes, is the generated sentence 1-29
Derivation

• By choosing alternative RHSs of rules with which to replace nonterminals in


the derivation, different sentences in the language can be generated.

• By exhaustively choosing all combinations of choices, the entire language


can be generated.

• This language, like most others, is infinite, so one cannot generate all
the sentences in the language in finite time.

Copyright © 2012 Pearson Education. All rights reserved. 1-30


Examples

Copyright © 2012 Pearson Education. All rights reserved. 1-31


3.3.1.6 Parse Trees

Parse trees naturally describe the


hierarchical syntactic structure of the
sentences of the languages they define

Hierarchical structures of the language


are called parse trees.

A parse tree for the simple statement A =


B * (A + C)that we have just derived is
given in the next slide

Copyright © 2012 Addison-Wesley. All rights reserved. 1-32


Example

You can read the generated


sentence from the parse
tree by applying DFS
(depth first search) from
left to right

Copyright © 2012 Pearson Education. All rights reserved. 1-33


Continued…

• Every internal node of a parse tree is labeled with a nonterminal


symbol (abstraction); every leaf is labeled with a terminal symbol.

• Every subtree of a parse tree describes one instance of an


abstraction in the sentence

Copyright © 2012 Pearson Education. All rights reserved. 1-34


3.3.1.7 Ambiguity in Grammars

A grammar is ambiguous if and only if it generates a


sentential (a sentence) form that has two or more
distinct parse trees

Example 3.3:
Two distinct parse trees for the same sentence,
A = B + C * A

<assign> → <id> = <expr>


<id> → A | B | C
<expr> → <expr> + <expr>
| <expr> * <expr>
| (<expr>)
| <id>

Copyright © 2012 Addison-Wesley. All rights reserved. 1-35


Ambiguity in Grammars Example
A = B + C * A

Copyright © 2012 Pearson Education. All rights reserved. 1-36


Continued…

• Example 3.2, Rather than allowing the parse tree of an expression to grow
only on the right, this grammar allows growth on both the left and the
right

• Syntactic ambiguity of language structures is a problem because compilers


often base the semantics of those structures on their syntactic form

• Specifically, the compiler chooses the code to be generated for a statement


by examining its parse tree

• If a language structure has more than one parse tree, then the meaning of
the structure cannot be determined uniquely

Copyright © 2012 Pearson Education. All rights reserved. 1-37


An Ambiguous Expression Grammar

<expr> → <expr> <op> <expr>


| const
<op> → / | -
<expr> <expr>

<expr> <op> <expr> <expr> <op> <expr>

<expr> <op> <expr> <expr> <op> <expr>

const - const / const const - const / const

Copyright © 2012 Addison-Wesley. All rights reserved. 1-38


An Unambiguous Expression Grammar

• If we use the parse tree to indicate


precedence levels of the operators, we can
eliminate ambiguity ? (lower operators have
higher precedency)
<expr> → <expr> - <term> | <term>
<term> → <term> / const| const

<expr>

<expr> - <term> const – const /const

<term> <term> / const

const const 1-39


3.3.1.8 Operator Precedence

When an expression includes two different operators, for example, x + y * z,


one obvious semantic issue is the order of evaluation of the two operators
(for
example, in this expression is it add and then multiply, or vice versa?)

This semantic question can be answered by assigning different precedence


levels to operators.

The fact that an operator in an arithmetic expression is generated lower in


the parse tree can be used to indicate that it has higher precedence over an
operator produced higher up in the tree.

1-40
Continued…
• Figure 3.2, for example, the multiplication operator is
generated lower in the tree, which could indicate that
it has precedence over the addition operator in the
expression. The second parse tree, however, indicates
just the opposite. It appears, therefore, that the two
parse trees indicate conflicting precedence information

1-41
Continued…
• A grammar needs to be written for the simple
expressions we have been discussing that is both
unambiguous and specifies a consistent precedence of
the + and * operators, regardless of the order in which
the operators appear in an expression

• The correct ordering is specified by using separate


nonterminal symbols to represent the operands of the
operators that have different precedence.

• If <expr> is the root symbol for expressions, + can be


forced to the top of the parse tree by having <expr>
directly generate only + operators, using the new
nonterminal, <term>, as the right operand of +

• Next, we can define <term> to generate * operators,


using <term> as the left operand and a new nonterminal,
<factor>, as its right operand
1-42
Example
Now, * will always be lower in the parse tree,
simply because it is further from the start
symbol than + in every derivation.

The grammar of Example 3.4 is such a grammar.

1-43
3.3.1.8 Operator Precedence

Copyright © 2012 Pearson Education. All rights reserved. 1-44


3.3.1.8 Operator Precedence
The unique parse tree for A = B + C * A using an
unambiguous grammar (Figure 3.3)

1-45
3.3.1.8 Operator Precedence
Every derivation with an unambiguous grammar has a
unique parse tree, although that tree can be
represented by different derivations.

1-46
3.3.1.9 Associativity of Operators

Do parse trees for expressions with two or more


adjacent occurrences of operators with equal precedence
have those occurrences in proper hierarchical order?

An example of an assignment using the previous grammar


is: A = B + C + A

1-47
3.3.1.9 Associativity of Operators

Figure above shows the left + operator lower than the


right + operator. This is the correct order if + operator
meant to be left associative, which is typical.

When a grammar rule has LHS also appearing at beginning of


its RHS, the rule is said to be left recursive. The left
recursion specifies left associativity.

In most languages that provide it, the exponentiation


operator is right associative. To indicate right
associativity, right recursion can be used. A grammar rule
is right recursive if the LHS appears at the right end of
the RHS. Rules such as:
<factor> → <exp> ** <factor>
| <exp>
<exp> → (<exp>)
| id 1-48
3.3.1.9 Associativity of Operators

• Subtraction and division are not associative, whether in mathematics or in


a computer. Therefore, correct associativity may be essential for an
expression that contains either of them

• Unfortunately, left recursion disallows the use of some important syntax


analysis algorithms. When one of these algorithms is to be used, the
grammar must be modified to remove the left recursion. This, in turn,
disallows the grammar from precisely specifying that certain operators are
left associative. Fortunately, left associativity can be enforced by the
compiler, even though the grammar does not dictate it

Copyright © 2012 Pearson Education. All rights reserved. 1-49


Extended BNF

• Optional parts are placed in brackets [ ]


<proc_call> -> ident [(<expr_list>)]
• Alternative parts of RHSs are placed
inside parentheses and separated via
vertical bars
<term> → <term> (+|-) const
• Repetitions (0 or more) are placed inside
braces { }
<ident> → letter {letter|digit}

Copyright © 2012 Addison-Wesley. All rights reserved. 1-50


BNF and EBNF

• BNF
<expr> → <expr> + <term>
| <expr> - <term>
| <term>
<term> → <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF
<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}

Copyright © 2012 Addison-Wesley. All rights reserved. 1-51


Recent Variations in EBNF

• Alternative RHSs are put on separate lines


• Use of a colon instead of =>
• Use of opt for optional parts
• Use of oneof for choices

Copyright © 2012 Addison-Wesley. All rights reserved. 1-52


Static Semantics

• Nothing to do with meaning


• Context-free grammars (CFGs) cannot describe all of the
syntax of programming languages
• Categories of constructs that are trouble:
- Context-free, but cumbersome (e.g.,
types of operands in expressions)
- Non-context-free (e.g., variables must
be declared before they are used)

Copyright © 2012 Addison-Wesley. All rights reserved. 1-53


Attribute Grammars

• Attribute grammars (AGs) have additions


to CFGs to carry some semantic info on
parse tree nodes

• Primary value of AGs:


– Static semantics specification
– Compiler design (static semantics checking)

Copyright © 2012 Addison-Wesley. All rights reserved. 1-54


Attribute Grammars

Def: An attribute grammar is a context-free grammar G = (S,


N, T, P) with the following additions:
– For each grammar symbol x there is a set A(x) of attribute values
– Each rule has a set of functions that define certain attributes of the
nonterminal in the rule
– Each rule has a (possibly empty) set of predicates to check for
attribute consistency
Attribute Grammars: Definition

• Let X0 → X1 ... Xn be a rule


• Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define
synthesized attributes
• Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j
<= n, define inherited attributes
• Initially, there are intrinsic attributes on the leaves

Copyright © 2012 Addison-Wesley. All rights reserved. 1-56


Attribute Grammars: An Example

• Syntax
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var>
<var> A | B | C
• actual_type: synthesized for <var>
and <expr>
• expected_type: inherited for <expr>

Copyright © 2012 Addison-Wesley. All rights reserved. 1-57


Attribute Grammars

CFG Conditions

Set of attributes Evaluation rules


Attributes: (T ∪ N) ➔ value Inherited or synthesized
Attribute Grammars: An Example

• Syntax
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var>
<var> A | B | C
• actual_type: synthesized for <var> and <expr>
• expected_type: inherited for <expr>
Attribute Grammars (continued)

• How are attribute values computed?


– If all attributes were inherited, the tree could be decorated in top-
down order.
– If all attributes were synthesized, the tree could be decorated in
bottom-up order.
– In many cases, both kinds of attributes are used, and it is some
combination of top-down and bottom-up that must be used.

Copyright © 2012 Addison-Wesley. All rights reserved. 1-60


Attribute Grammars (continued)

<expr>.expected_type  inherited from parent

<var>[1].actual_type  lookup (A)


<var>[2].actual_type  lookup (B)
<var>[1].actual_type =? <var>[2].actual_type

<expr>.actual_type  <var>[1].actual_type
<expr>.actual_type =? <expr>.expected_type

Copyright © 2012 Addison-Wesley. All rights reserved. 1-61


Attribute Grammar (continued)

• Syntax rule: <expr> → <var>[1] + <var>[2]


Semantic rules:
<expr>.actual_type  <var>[1].actual_type
Predicate:
<var>[1].actual_type == <var>[2].actual_type
<expr>.expected_type == <expr>.actual_type

• Syntax rule: <var> → id


Semantic rule:
<var>.actual_type  lookup (<var>.string)
Semantics

• There is no single widely acceptable notation or formalism


for describing semantics
• Several needs for a methodology and notation for
semantics:
– Programmers need to know what statements mean
– Compiler writers must know exactly what language constructs do
– Correctness proofs would be possible
– Compiler generators would be possible
– Designers could detect ambiguities and inconsistencies
• X+1 ;
X+1 ;

• X+2
Operational Semantics

• Operational Semantics
• Describe the meaning of a program by executing its
statements on a machine, either simulated or actual. The
change in the state of the machine (memory, registers, etc.)
defines the meaning of the statement
• To use operational semantics for a high-level language, a
virtual machine is needed
Operational Semantics (continued)

• A better alternative: A complete computer simulation


• The process:
• Build a translator (translates source code to the machine
code of an idealized computer)
• Build a simulator for the idealized computer
• Evaluation of operational semantics:
• Good if used informally (language manuals, etc.)
• Extremely complex if used formally (e.g., VDL), it was used
for describing semantics of PL/I.
Operational Semantics (continued)

• Uses of operational semantics:


- Language manuals and textbooks
- Teaching programming languages

• Two different levels of uses of operational semantics:


- Natural operational semantics
- Structural operational semantics

• Evaluation
- Good if used informally (language
manuals, etc.)
- Extremely complex if used formally (e.g.,VDL)
Operational Semantics

• A hardware pure interpreter would be too expensive


• A software pure interpreter also has problems
– The detailed characteristics of the particular computer would make
actions difficult to understand
– Such a semantic definition would be machine- dependent

Copyright © 2012 Addison-Wesley. All rights reserved. 1-68


Denotational Semantics

• Based on recursive function theory


• The most abstract semantics description method
• Originally developed by Scott and Strachey (1970)
Denotational Semantics - continued

• The process of building a denotational specification for a


language:
- Define a mathematical object for each language
entity
– Define a function that maps instances of the language entities
onto instances of the corresponding mathematical objects
• The meaning of language constructs are defined by only
the values of the program's variables
Denotational Semantics: program state

• The state of a program is the values of all its current


variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}

• Let VARMAP be a function that, when given a variable


name and a state, returns the current value of the variable
VARMAP(ij, s) = vj
Operational & Denotational Semantics

Operational Semantics Denotational Semantics

Based on programing languages Based on mathematics and logic

Uses an intermediate language Uses mathematical objects

Can lead to circularities Never leads to circularities

Defined in terms of state changes Defined in terms of all program


variables

Copyright © 2012 Pearson Education. All rights reserved. 1-72


Example: Intermediate Language

• This is a human-oriented intermediate


language
• ident = var
• ident = ident + 1
• Ident = ident -1
• Goto label
• if var relope var goto lable

Relope is a relational operater form the set


{=, <>,>,<,>=,<=}
Copyright © 2012 Pearson Education. All rights reserved. 1-73
Example: Circularities

• The statement of a programing language


are described in terms of the statement of
lower-level programing languages.
• This can lead to situations in which
concepts are indirectly defined in terms of
themselves.
• When this happens, it leads to having
circularities
• A language’s if statements with the if
statement of the language the interpreter is
written in
Copyright © 2012 Pearson Education. All rights reserved. 1-74
Denotational Semantics: program state

• The state of a program is the values of all its current


variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}

• Let VARMAP be a function that, when given a variable


name and a state, returns the current value of the variable
VARMAP(ij, s) = vj

Copyright © 2012 Addison-Wesley. All rights reserved. 1-75


Binary Numbers

Character string representation of binary numbers


in Denotational Semantics. The syntax of such
Binary numbers can be described by the following
grammar rules:
<bin_num> → ‘0’
| ‘1’
| <bin_num> ‘0’
| <bin_num> ‘1’

Copyright © 2012 Pearson Education. All rights reserved. 1-76


Parse Tree

A parse tree for binary numbers representing


110

<bin_num>

<bin_num> ‘0’

<bin_num>
‘1’

‘1’
Copyright © 2012 Addison-Wesley. All rights reserved. 1-77
Mapping

• The syntactic domain if mapping function for binary


numbers is the set of all character string representations of
binary numbers.
• The semantic domain is the set of nonnegative numbers
(symbol N).
• Map the actual meaning (a decimal number ) with each rule
that has a single terminal symbol as its RHS.

Copyright © 2012 Pearson Education. All rights reserved. 1-78


Mapping Example

• In our binary number example we will:


1- Map the first two rules to decimal numbers
2-Map the last two rules to functions that represent the RHS
complete meaning
• Use the semantic function Mbin to map the syntactic objects
in our grammar rules to the objects in N

Copyright © 2012 Pearson Education. All rights reserved. 1-79


Function Mbin

Mbin(‘0’) = 0

Mbin(‘1’) = 1

Mbin(<bin_num> ‘0’) = 2 * Mbin(<bin_num>)

M(bin) (<bin_num> ‘1’) = 2 * Mbin (<bin_num>) + 1

Copyright © 2012 Pearson Education. All rights reserved. 1-80


Parse Tree

Now we can calculate the value of nodes In our


previous parse tree that represent 110
6
<bin_num>

3
<bin_num> ‘0’

1
<bin_num>
‘1’

This is syntax-directed
‘1’ semantics.
Copyright © 2012 Addison-Wesley. All rights reserved. 1-81
Decimal Numbers

<dec_num> → '0' | '1' | '2' | '3' | '4' | '5' |


'6' | '7' | '8' | '9' |
<dec_num> ('0' | '1' | '2' | '3' |
'4' | '5' | '6' | '7' |
'8' | '9')

Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9


Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)
Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1

Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9

Copyright © 2012 Addison-Wesley. All rights reserved. 1-82


Expressions

• We deal with simple expressions


• We use only + and *, one per expression
• Map expressions onto Z  {error}
• We assume expressions are decimal numbers, variables, or
binary expressions having one arithmetic operator and two
operands, each of which can be an expression

Copyright © 2012 Addison-Wesley. All rights reserved. 1-83


Symbols

• We use (=) to define mathematical functions


• We use (=>) (implication symbol) to connect the
form of an operand with its associated case (or
switch)construct.
• We use (.) to refer to the child nodes of a node.

Copyright © 2012 Pearson Education. All rights reserved. 1-84


Expressions

Me(<expr>, s) =
case <expr> of
<dec_num> => Mdec(<dec_num>, s)
<var> =>
if VARMAP(<var>, s) == undef
then error
else VARMAP(<var>, s)
<binary_expr> =>
if (Me(<binary_expr>.<left_expr>, s) == undef
OR Me(<binary_expr>.<right_expr>, s) =
undef)
then error
else
if (<binary_expr>.<operator> == '+' then
Me(<binary_expr>.<left_expr>, s) +
Me(<binary_expr>.<right_expr>, s)
else Me(<binary_expr>.<left_expr>, s) *
Me(<binary_expr>.<right_expr>, s)
...

Copyright © 2012 Addison-Wesley. All rights reserved. 1-85


Assignment Statements

• Maps state sets to state sets U {error}


• It is evaluation and setting the target variable to the expression’s
value. Here we are working on state to state.

Ma(x := E, s) =
if Me(E, s) == error
then error
else s’ = {<i1,v1’>,<i2,v2’>,...,<in,vn’>},
where for j = 1, 2, ..., n,
if ij == x
then vj’ = Me(E, s)
else vj’ = VARMAP(ij, s)

Copyright © 2012 Addison-Wesley. All rights reserved. 1-86


Logical Pretest Loops

• Maps state sets to state sets U {error}


• Assume there are two other existeing mapping functions
(Msland Mb)
Ml(while B do L, s) =
if Mb(B, s) == undef
then error
else if Mb(B, s) == false
then s
else if Msl(L, s) == error
then error
else Ml(while B do L, Msl(L, s))

Copyright © 2012 Addison-Wesley. All rights reserved. 1-87


Loop Meaning

• The meaning of the loop is the value of the


program variables after the statements in the loop
have been executed the prescribed number of
times, assuming there have been no errors
• In essence, the loop has been converted from
iteration to recursion, where the recursive control
is mathematically defined by other recursive state
mapping functions
• This loop just like actual program loops, may
compute nothing because of nontermination
(infinite loop)
• Recursion, when compared to iteration, is easier
to describe with mathematical rigor
Copyright © 2012 Addison-Wesley. All rights reserved. 1-88
Evaluation of Denotational Semantics

• Can be used to prove the correctness of programs


• Provides a rigorous way to think about programs
• Can be an aid to language design
• Has been used in compiler generation systems
• Because of its complexity, it is of little use to language
users

Copyright © 2012 Addison-Wesley. All rights reserved. 1-89


Axiomatic Semantics

• Based on formal logic (predicate calculus)


• Original purpose: formal program verification
• Axioms or inference rules are defined for each statement
type in the language (to allow transformations of logic
expressions into more formal logic expressions)
• The logic expressions are called assertions

Copyright © 2012 Addison-Wesley. All rights reserved. 1-90


Axiomatic Semantics (continued)
• An assertion before a statement (a
precondition) states the relationships and
constraints among variables that are true at
that point in execution
• An assertion following a statement is a
postcondition
• A weakest precondition is the least
restrictive precondition that will guarantee
the postcondition

Copyright © 2012 Addison-Wesley. All rights reserved. 1-91


Axiomatic Semantics Form

• Pre-, post form: {P} statement {Q}

• An example
– a = b + 1 {a > 1}
– One possible precondition: {b > 10}
– Weakest precondition: {b > 0}
– Can’t find a weaker precondition since the statement
will not remain true

Copyright © 2012 Addison-Wesley. All rights reserved. 1-92


Inference Rule

• A method of inferring the truth of one assertion on the


basis of the value of other assertions.
• It has the general form S1, S2, ….., Sn/ S
• This means that if S1, S2, ……., Sn are all true than the
truth of S is inferred.
• The top of is called antecedent.
• The bottom is called consequent.
• An axiom is assumed to be true, thus it has no antecedent.

Copyright © 2012 Pearson Education. All rights reserved. 1-93


Assignment Statements

• The precondition and postcondition of an assignment


statement together define its meaning.
• To define the meaning of an assignment statement there
must be a way to compute its precondition from it
postcondition.
• Let x = E be a general assignment statement and Q be its
postcondition.
• Then, its weakest precondition p, is defined by the axiom P
= Q x => E

Copyright © 2012 Pearson Education. All rights reserved. 1-94


Assignment Statements (cont.)

• This means that P is computed as Q with all instances of x


replaced by E.
• For example, if we have the assignment statement and
postcondition a = b / 2 – 1 {a > 10}
• The weakest precondition by substituting b / 2 -1 for a in {
a > 10}.
• b / 2 -1 < 10
• b < 22
• Thus, the weakest precondition is {b < 22}

Copyright © 2012 Pearson Education. All rights reserved. 1-95


Program Proof Process

• The postcondition for the entire program is


the desired result
– Work back through the program to the first
statement. If the precondition on the first
statement is the same as the program
specification, the program is correct.

Copyright © 2012 Addison-Wesley. All rights reserved. 1-96


Axiomatic Semantics: Assignment

• An axiom for assignment statements


(x = E): {Qx->E} x = E {Q}

• The Rule of Consequence:


{P} S {Q}, P'  P, Q  Q'
{P' } S {Q' }

Copyright © 2012 Addison-Wesley. All rights reserved. 1-97


The Rule of Consequence

• {P} S {Q}, P’=> P, Q => Q’ / {P’} S {Q’}


• Where S is a program statement.
• This means that if {P} S {Q} is true
• The assertion P impels P’
• The assertion Q impels Q’
• Then it can be inferred that {P’} S {Q’}
• In other words, a postcondition can always be weakened
and a precondition can always be strengthened.
• This is quite useful in program proofs.
Copyright © 2012 Pearson Education. All rights reserved. 1-98
Axiomatic Semantics: Sequences

• The weakest precondition for a sequence of


statements cannot be described by an axiom.
• An inference rule for sequences of the form
S1; S2
{P1} S1 {P2}
{P2} S2 {P3}
{P1} S1{P2}, {P2} S2 {P3}
{P1} S1; S2 {P3}

Copyright © 2012 Addison-Wesley. All rights reserved. 1-99


Axiomatic Semantics: Selection

• An inference rules for selection


- if B then S1 else S2

{B and P} S1 {Q}, {(not B) and P} S2 {Q}


{P} if B then S1 else S2 {Q}

Copyright © 2012 Addison-Wesley. All rights reserved. 1-100


Axiomatic Semantics: Loops
• An inference rule for logical pretest loops

{P} while B do S end {Q}

(I and B) S {I}
{I} while B do S {I and (not B)}
where I is the loop invariant (the inductive
hypothesis)
The weakest precondition for the while loop must
guarantee the truth of the invariant
The truth of the invariant must not change by the
evolution of the loop controlling Boolean and the
loop body
Copyright © 2012 Addison-Wesley. All rights reserved. 1-101
Axiomatic Semantics: Axioms
• Characteristics of the loop invariant: I must
meet the following conditions:
– P => I -- the loop invariant must be true initially
– {I} B {I} -- evaluation of the Boolean must not change the validity of I
– {I and B} S {I} -- I is not changed by executing the body of the loop
– (I and (not B)) => Q -- if I is true and B is false, Q is implied

– The loop terminates -- can be difficult to prove

Copyright © 2012 Addison-Wesley. All rights reserved. 1-102


Loop Invariant

• The loop invariant I is a weakened version of the loop


postcondition, and it is also a precondition.
• I must be weak enough to be satisfied prior to the
beginning of the loop, but when combined with the loop
exit condition, it must be strong enough to force the truth
of the postcondition

Copyright © 2012 Addison-Wesley. All rights reserved. 1-103


Evaluation of Axiomatic Semantics

• Developing axioms or inference rules for all of the


statements in a language is difficult
• It is a good tool for correctness proofs, and an excellent
framework for reasoning about programs, but it is not as
useful for language users and compiler writers
• Its usefulness in describing the meaning of a programming
language is limited for language users or compiler writers

Copyright © 2012 Addison-Wesley. All rights reserved. 1-104


Denotation Semantics vs Operational
Semantics
• In operational semantics, the state changes
are defined by coded algorithms
• In denotational semantics, the state
changes are defined by rigorous
mathematical functions

Copyright © 2012 Addison-Wesley. All rights reserved. 1-105


Summary

• BNF and context-free grammars are equivalent meta-


languages
– Well-suited for describing the syntax of programming languages
• An attribute grammar is a descriptive formalism that can
describe both the syntax and the semantics of a language
• Three primary methods of semantics description
– Operation, axiomatic, denotational

Copyright © 2012 Addison-Wesley. All rights reserved. 1-106

You might also like