Ch03 Slides

Chapter 3
Describing Syntax and

Semantics
Chapter 3 Topics
• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Attribute Grammars
• Describing the Meanings of Programs: Dynamic
Semantics
Copyright © 2012 Addison-Wesley. All rights reserved. 1-2

INTRODUCTION
Copyright © 2012 Pearson Education. All rights reserved. 1-3

Introduction
• A programming language’s success relies on the description of it. It must

be concise yet understandable.
• One of the main problems that's involved in describing a language, is how

diverse the audience you must attend for: (also known as users of language
definition)
– initial evaluators
– Implementors (compiler writers)
– Users (programmers)
• Syntax and semantics is what makes a programming language. They

provide the language definition.

Syntax
• Syntax of programming languages is the form of its expressions,

statements, and program units.
• For example, in Java while and if statements are written as:
while (Boolean_expression) {
. . .
}

Syntax
if (condition) {
. . .
} else {
. . .
}
• This is the syntax of java, in other words, we can say syntax is rules of grammar
and spelling of the language.
• If the writing doesn’t abide with the grammar or spelling, we get an error (invalid
syntax).

Semantics
• Semantics is the meaning of those expressions, statements, and

program units.
if (condition) {
. . .
} else {
. . .
}
• The semantics of the if statement form is that when the condition is
– true, the expression surrounded by if is executed.
– if it's false, it the else block will be executed.

Syntax and Semantics
• As we saw from our previous example, syntax and semantics are

closely related. In a well-designed programming language, semantics
should follow directly from syntax.
• Is it easier to create syntax than semantics?

– Yes, because there are universally accepted notations for syntax.

THE GENERAL
PROBLEM OF
DESCRIBING SYNTAX

The General Problem of Describing
Syntax: Terminology
• A sentence (or statement) is a string of characters

over some alphabet
• A language is a set of sentences
• A lexeme is the lowest level syntactic unit of a

language (e.g., *, sum, begin)
– Lexemes can include numeric literals, operators, and
special words.
– Programs can be thought of as strings of lexemes.

The General Problem of Describing
Syntax: Terminology
• A token is a category of lexemes (e.g.,

identifier)
– an identifier is a token that can have lexemes,
or instances, like sum and total.
– A token can have a single lexeme. Just like

arithmetic operations.
• + has just one possible lexeme.

Exercise
• Consider the following Java Statement:

index = 2 * count + 17
Lexemes Tokens
index identifier
= equal_sign
2 int_literal
* mult_op
count identifier
+ plus_op
17 int_literal
; semicolon

Formal Definition of Languages
• Languages can be defined in two ways,

Recognition and Generation.
• Language Recognizers:
– A recognizer reads input strings over the alphabet of
the language and decides whether the input string
belongs to the language.
– Example: The syntax analysis part of a compiler is a

recognizer for the language that the compiler
translates.
• The structure of the syntax analysers (parsers) will be
discussed in Chapter 4
1-13
Formal Definition of Languages
• Language Generators:
– Language generator is a device that can be used to
generate the sentences of a language.
– It can be determined whether the syntax of a particular

statement is correct by comparing it with the structure of
the generator.
– Examples are: Grammar and BNF
• Recognition and generation are useful for different

things, but are closely related.

BACKUS-NAUR FORM
AND CONTEXT-FREE
GRAMMARS

Context-Free Grammar
• Developed by linguist Noam Chomsky in the mid-1950s
• Defined four classes of generative devices or grammars.

– Two of these grammars are called context-free and regular
– Regular grammars are used to describe the tokens of programming languages.
– Context-free grammars can be used to describe syntax of the whole
programming language (with minor exceptions).

Backus-Naur Form
• John Backus introduces a new formal notation for

specifying programming language syntax.
– This was later modified slightly by Peter Naur.
– The revised method of syntax description became known
as Backus-Naur Form (BNF).
– From that ALGOL60 was created. Which became most
popular method of describing programming language
syntax
• BNF was almost identical to context-free grammar.

– BNF is a natural notation for describing syntax
– Further in this chapter, context-free grammars will be
referred to as grammars. Also, the terms BNF and
grammar can be used interchangeably.
1-17
Fundamentals
• Metalanguage: a language that is used to describe another language

– BNF is a metalanguage for programming languages.
• BNF uses abstractions for syntactic structures, which act like

variables.
– Example: Java assignment can be represented by the abstraction <assign>
<assign> → <var> = <expression>
Left-hand side (LHS) is the abstraction being defined.
Right-hand side (RHS) of the arrow consist of tokens, lexemes, and references to
other abstractions.

Fundamentals
• Abstractions in BNF Description are called non-
terminal symbols (nonterminals) often enclosed in
angle brackets.
• lexemes and tokens of the rules are called terminal
symbols (terminals)
• A rule or production describes the structure of the
statement.
– Rules have LHS and RHS. Consisting of terminal and
nonterminal symbols.
<assign> → <var> = <expression>
– This example’s rule says that abstraction <assign> is an

instance of abstraction <var>, followed by lexeme =,
ending with instance of abstraction <expression> 1-19
Fundamentals
• An example of <assign> rule is:
total = subtotal1 + subtotal2
<var> = <expression>
• Grammar: is a finite non-empty set of rules.

• Nonterminals can have two or more distinct
definitions. Multiple definitions can be written as a
single rule, with the different definitions separated
by | (logical OR).
– This means that abstraction (nonterminals) can have more
than one RHS
<stmt> → <single_stmt>
| begin <stmt_list> end
1-20
Fundamentals
• Example of multiple definitions written as a single rule.

• Java if statements can be described with the rules:
(1) <if_stmt> → if ( <logic_expr> ) <stmt>
(2) <if_stmt> → if ( <logic_expr> ) <stmt> else <stmt>
or with the rule (combining (1) and (2)):

<if_stmt> → if ( <logic_expr> ) <stmt>
| if ( <logic_expr> ) <stmt> else <stmt>
• In these rules, <stmt> is either a single or a compound statement.

Fundamentals
• BNF is simple, yet powerful enough to describe nearly all the syntax
of programming languages.
• Particularly, it can describe:

– Lists of similar constructs.
– Order in which different constructs must appear
– Nested Structures to any depth
– Imply operator precedence
– Imply operator associativity

DESCRIBING LISTS

DESCRIBING LISTS
• In order to explain lists such as 1,2, . . . We must use an alternative

to ellipsis (. . .) as BNF doesn’t include it.
• The alternative would be to use recursion.
• A rule is recursive if its LHS appears in its RHS.
• Example:
<ident_list> → ident
| ident, <ident_list>
• Here <ident_list> can be defined as either single token (identifier) or an
identifier followed by another instance of <ident_list>.
• Recursion is used to describe lists in many grammars
1-24
3.3.1.5 Grammars and Derivations
A grammar is a generative device for defining

languages.
beginning with a special nonterminal

(Abstraction) of the grammar called the start
symbol.
The sentences of the language are generated

through a sequence of applications of the rules
called derivation
the start symbol represents a complete program

and is often named <program>
1-25
Example
Example 3.1
<program> → begin <stmt_list> end
<stmt_list> → <stmt>
| <stmt> ; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var>+<var> | <var>-<var> | <var>

Representing rules
• The two rules are equivalent
| <stmt> ; <stmt_list>
OR
<stmt_list> → <stmt> ; <stmt_list>

An Example Derivation
Generating sentences from the grammar is called derivation
Example:
<program> => begin <stmt_list> end
=> begin <stmt>; <stmt_list> end
=> begin <var> = <epression>; <stmt_list> end
=> begin A = <epression>; <stmt_list> end
=> begin A = <var> + <var>; <stmt_list> end
=> begin A = B + <var>; <stmt_list> end
=> begin A = B + C; <stmt_list> end
=> begin A = B + C; <stmt> end
=> begin A = B + C; <var> = <expression> end
=> begin A = B + C; B = <expression> end
=> begin A = B + C; B = <var> end
=> begin A = B + C; B = C end
opright © 2012 Pearson Education. All rights reserved. 1-28

Derivation
• A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence(all terminal symbols)
• The symbol => is read “derives”.
• Every string of symbols in the derivation, including

<program>, is a sentential form.
• A leftmost derivation is one in which the leftmost

nonterminal in each sentential form is the one that is
expanded. The derivation continues until the sentential
form contains no nonterminals.
• A derivation may be neither leftmost nor rightmost.
• The derivation continues until the sentential form

(String) does not contain any nonterminals. That
sentential form, consisting of only terminals, or
lexemes, is the generated sentence 1-29
Derivation
• By choosing alternative RHSs of rules with which to replace nonterminals in

the derivation, different sentences in the language can be generated.
• By exhaustively choosing all combinations of choices, the entire language

can be generated.
• This language, like most others, is infinite, so one cannot generate all
the sentences in the language in finite time.

Examples

3.3.1.6 Parse Trees
Parse trees naturally describe the

hierarchical syntactic structure of the
sentences of the languages they define
Hierarchical structures of the language

are called parse trees.
A parse tree for the simple statement A =

B * (A + C)that we have just derived is
given in the next slide

Example
You can read the generated

sentence from the parse
tree by applying DFS
(depth first search) from
left to right

Continued…
• Every internal node of a parse tree is labeled with a nonterminal

symbol (abstraction); every leaf is labeled with a terminal symbol.
• Every subtree of a parse tree describes one instance of an

abstraction in the sentence

3.3.1.7 Ambiguity in Grammars
A grammar is ambiguous if and only if it generates a

sentential (a sentence) form that has two or more
distinct parse trees
Example 3.3:
Two distinct parse trees for the same sentence,
A = B + C * A
<assign> → <id> = <expr>

<id> → A | B | C
<expr> → <expr> + <expr>
| <expr> * <expr>
| (<expr>)
| <id>

Ambiguity in Grammars Example
A = B + C * A

Continued…
• Example 3.2, Rather than allowing the parse tree of an expression to grow
only on the right, this grammar allows growth on both the left and the
right
• Syntactic ambiguity of language structures is a problem because compilers

often base the semantics of those structures on their syntactic form
• Specifically, the compiler chooses the code to be generated for a statement

by examining its parse tree
• If a language structure has more than one parse tree, then the meaning of
the structure cannot be determined uniquely

An Ambiguous Expression Grammar
<expr> → <expr> <op> <expr>

| const
<op> → / | -
<expr> <expr>
<expr> <op> <expr> <expr> <op> <expr>
<expr> <op> <expr> <expr> <op> <expr>
const - const / const const - const / const

An Unambiguous Expression Grammar
• If we use the parse tree to indicate

precedence levels of the operators, we can
eliminate ambiguity ? (lower operators have
higher precedency)
<expr> → <expr> - <term> | <term>
<term> → <term> / const| const
<expr>
<expr> - <term> const – const /const
<term> <term> / const
const const 1-39

3.3.1.8 Operator Precedence
When an expression includes two different operators, for example, x + y * z,

one obvious semantic issue is the order of evaluation of the two operators
(for
example, in this expression is it add and then multiply, or vice versa?)
This semantic question can be answered by assigning different precedence

levels to operators.
The fact that an operator in an arithmetic expression is generated lower in

the parse tree can be used to indicate that it has higher precedence over an
operator produced higher up in the tree.
1-40
Continued…
• Figure 3.2, for example, the multiplication operator is
generated lower in the tree, which could indicate that
it has precedence over the addition operator in the
expression. The second parse tree, however, indicates
just the opposite. It appears, therefore, that the two
parse trees indicate conflicting precedence information
1-41
Continued…
• A grammar needs to be written for the simple
expressions we have been discussing that is both
unambiguous and specifies a consistent precedence of
the + and * operators, regardless of the order in which
the operators appear in an expression
• The correct ordering is specified by using separate

nonterminal symbols to represent the operands of the
operators that have different precedence.
• If <expr> is the root symbol for expressions, + can be

forced to the top of the parse tree by having <expr>
directly generate only + operators, using the new
nonterminal, <term>, as the right operand of +
• Next, we can define <term> to generate * operators,

using <term> as the left operand and a new nonterminal,
<factor>, as its right operand
1-42
Example
Now, * will always be lower in the parse tree,
simply because it is further from the start
symbol than + in every derivation.
The grammar of Example 3.4 is such a grammar.
1-43

The unique parse tree for A = B + C * A using an
unambiguous grammar (Figure 3.3)
1-45
Every derivation with an unambiguous grammar has a
unique parse tree, although that tree can be
represented by different derivations.
1-46
3.3.1.9 Associativity of Operators
Do parse trees for expressions with two or more

adjacent occurrences of operators with equal precedence
have those occurrences in proper hierarchical order?
An example of an assignment using the previous grammar

is: A = B + C + A
1-47
Figure above shows the left + operator lower than the

right + operator. This is the correct order if + operator
meant to be left associative, which is typical.
When a grammar rule has LHS also appearing at beginning of

its RHS, the rule is said to be left recursive. The left
recursion specifies left associativity.
In most languages that provide it, the exponentiation

operator is right associative. To indicate right
associativity, right recursion can be used. A grammar rule
is right recursive if the LHS appears at the right end of
the RHS. Rules such as:
<factor> → <exp> ** <factor>
| <exp>
<exp> → (<exp>)
| id 1-48
• Subtraction and division are not associative, whether in mathematics or in

a computer. Therefore, correct associativity may be essential for an
expression that contains either of them
• Unfortunately, left recursion disallows the use of some important syntax

analysis algorithms. When one of these algorithms is to be used, the
grammar must be modified to remove the left recursion. This, in turn,
disallows the grammar from precisely specifying that certain operators are
left associative. Fortunately, left associativity can be enforced by the
compiler, even though the grammar does not dictate it

Extended BNF
• Optional parts are placed in brackets [ ]

<proc_call> -> ident [(<expr_list>)]
• Alternative parts of RHSs are placed
inside parentheses and separated via
vertical bars
<term> → <term> (+|-) const
• Repetitions (0 or more) are placed inside
braces { }
<ident> → letter {letter|digit}

BNF and EBNF
• BNF
<expr> → <expr> + <term>
| <expr> - <term>
| <term>
<term> → <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF
<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}

Recent Variations in EBNF
• Alternative RHSs are put on separate lines

• Use of a colon instead of =>
• Use of opt for optional parts
• Use of oneof for choices

Static Semantics
• Nothing to do with meaning

• Context-free grammars (CFGs) cannot describe all of the
syntax of programming languages
• Categories of constructs that are trouble:
- Context-free, but cumbersome (e.g.,
types of operands in expressions)
- Non-context-free (e.g., variables must
be declared before they are used)

Attribute Grammars
• Attribute grammars (AGs) have additions

to CFGs to carry some semantic info on
parse tree nodes
• Primary value of AGs:

– Static semantics specification
– Compiler design (static semantics checking)

Attribute Grammars
Def: An attribute grammar is a context-free grammar G = (S,

N, T, P) with the following additions:
– For each grammar symbol x there is a set A(x) of attribute values
– Each rule has a set of functions that define certain attributes of the
nonterminal in the rule
– Each rule has a (possibly empty) set of predicates to check for
attribute consistency
Attribute Grammars: Definition
• Let X0 → X1 ... Xn be a rule

• Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define
synthesized attributes
• Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j
<= n, define inherited attributes
• Initially, there are intrinsic attributes on the leaves

Attribute Grammars: An Example
• Syntax
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var>
<var> A | B | C
• actual_type: synthesized for <var>
and <expr>
• expected_type: inherited for <expr>

Attribute Grammars
CFG Conditions
Set of attributes Evaluation rules

Attributes: (T ∪ N) ➔ value Inherited or synthesized
Attribute Grammars: An Example
• Syntax
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var>
<var> A | B | C
• actual_type: synthesized for <var> and <expr>
• expected_type: inherited for <expr>
Attribute Grammars (continued)
• How are attribute values computed?

– If all attributes were inherited, the tree could be decorated in top-
down order.
– If all attributes were synthesized, the tree could be decorated in
bottom-up order.
– In many cases, both kinds of attributes are used, and it is some
combination of top-down and bottom-up that must be used.

Attribute Grammars (continued)
<expr>.expected_type  inherited from parent
<var>[1].actual_type  lookup (A)

<var>[2].actual_type  lookup (B)
<var>[1].actual_type =? <var>[2].actual_type
<expr>.actual_type  <var>[1].actual_type
<expr>.actual_type =? <expr>.expected_type

Attribute Grammar (continued)
• Syntax rule: <expr> → <var>[1] + <var>[2]

Semantic rules:
<expr>.actual_type  <var>[1].actual_type
Predicate:
<var>[1].actual_type == <var>[2].actual_type
<expr>.expected_type == <expr>.actual_type
• Syntax rule: <var> → id

Semantic rule:
<var>.actual_type  lookup (<var>.string)
Semantics
• There is no single widely acceptable notation or formalism

for describing semantics
• Several needs for a methodology and notation for
semantics:
– Programmers need to know what statements mean
– Compiler writers must know exactly what language constructs do
– Correctness proofs would be possible
– Compiler generators would be possible
– Designers could detect ambiguities and inconsistencies
• X+1 ;
X+1 ;
• X+2
Operational Semantics
• Operational Semantics
• Describe the meaning of a program by executing its
statements on a machine, either simulated or actual. The
change in the state of the machine (memory, registers, etc.)
defines the meaning of the statement
• To use operational semantics for a high-level language, a
virtual machine is needed
Operational Semantics (continued)
• A better alternative: A complete computer simulation

• The process:
• Build a translator (translates source code to the machine
code of an idealized computer)
• Build a simulator for the idealized computer
• Evaluation of operational semantics:
• Good if used informally (language manuals, etc.)
• Extremely complex if used formally (e.g., VDL), it was used
for describing semantics of PL/I.
Operational Semantics (continued)
• Uses of operational semantics:

- Language manuals and textbooks
- Teaching programming languages
• Two different levels of uses of operational semantics:

- Natural operational semantics
- Structural operational semantics
• Evaluation
- Good if used informally (language
manuals, etc.)
- Extremely complex if used formally (e.g.,VDL)
Operational Semantics
• A hardware pure interpreter would be too expensive

• A software pure interpreter also has problems
– The detailed characteristics of the particular computer would make
actions difficult to understand
– Such a semantic definition would be machine- dependent

Denotational Semantics
• Based on recursive function theory

• The most abstract semantics description method
• Originally developed by Scott and Strachey (1970)
Denotational Semantics - continued
• The process of building a denotational specification for a

language:
- Define a mathematical object for each language
entity
– Define a function that maps instances of the language entities
onto instances of the corresponding mathematical objects
• The meaning of language constructs are defined by only
the values of the program's variables
Denotational Semantics: program state
• The state of a program is the values of all its current

variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}
• Let VARMAP be a function that, when given a variable

name and a state, returns the current value of the variable
VARMAP(ij, s) = vj
Operational & Denotational Semantics
Operational Semantics Denotational Semantics
Based on programing languages Based on mathematics and logic
Uses an intermediate language Uses mathematical objects
Can lead to circularities Never leads to circularities
Defined in terms of state changes Defined in terms of all program

variables

Example: Intermediate Language
• This is a human-oriented intermediate

language
• ident = var
• ident = ident + 1
• Ident = ident -1
• Goto label
• if var relope var goto lable
Relope is a relational operater form the set

{=, <>,>,<,>=,<=}
Example: Circularities
• The statement of a programing language

are described in terms of the statement of
lower-level programing languages.
• This can lead to situations in which
concepts are indirectly defined in terms of
themselves.
• When this happens, it leads to having
circularities
• A language’s if statements with the if
statement of the language the interpreter is
written in
Denotational Semantics: program state
• The state of a program is the values of all its current

variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}
• Let VARMAP be a function that, when given a variable

name and a state, returns the current value of the variable
VARMAP(ij, s) = vj

Binary Numbers
Character string representation of binary numbers

in Denotational Semantics. The syntax of such
Binary numbers can be described by the following
grammar rules:
<bin_num> → ‘0’
| ‘1’
| <bin_num> ‘0’
| <bin_num> ‘1’

Parse Tree
A parse tree for binary numbers representing

110
<bin_num>
<bin_num> ‘0’
<bin_num>
‘1’
‘1’
Mapping
• The syntactic domain if mapping function for binary

numbers is the set of all character string representations of
binary numbers.
• The semantic domain is the set of nonnegative numbers
(symbol N).
• Map the actual meaning (a decimal number ) with each rule
that has a single terminal symbol as its RHS.

Mapping Example
• In our binary number example we will:

1- Map the first two rules to decimal numbers
2-Map the last two rules to functions that represent the RHS
complete meaning
• Use the semantic function Mbin to map the syntactic objects
in our grammar rules to the objects in N

Function Mbin
Mbin(‘0’) = 0
Mbin(‘1’) = 1
Mbin(<bin_num> ‘0’) = 2 * Mbin(<bin_num>)
M(bin) (<bin_num> ‘1’) = 2 * Mbin (<bin_num>) + 1

Parse Tree
Now we can calculate the value of nodes In our

previous parse tree that represent 110
6
<bin_num>
3
<bin_num> ‘0’
1
<bin_num>
‘1’
This is syntax-directed
‘1’ semantics.
Decimal Numbers
<dec_num> → '0' | '1' | '2' | '3' | '4' | '5' |

'6' | '7' | '8' | '9' |
<dec_num> ('0' | '1' | '2' | '3' |
'4' | '5' | '6' | '7' |
'8' | '9')
Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9

Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)
Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1
…
Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9

Expressions
• We deal with simple expressions

• We use only + and *, one per expression
• Map expressions onto Z  {error}
• We assume expressions are decimal numbers, variables, or
binary expressions having one arithmetic operator and two
operands, each of which can be an expression

Symbols
• We use (=) to define mathematical functions

• We use (=>) (implication symbol) to connect the
form of an operand with its associated case (or
switch)construct.
• We use (.) to refer to the child nodes of a node.

Expressions
Me(<expr>, s) =
case <expr> of
<dec_num> => Mdec(<dec_num>, s)
<var> =>
if VARMAP(<var>, s) == undef
then error
else VARMAP(<var>, s)
<binary_expr> =>
if (Me(<binary_expr>.<left_expr>, s) == undef
OR Me(<binary_expr>.<right_expr>, s) =
undef)
then error
else
if (<binary_expr>.<operator> == '+' then
Me(<binary_expr>.<left_expr>, s) +
Me(<binary_expr>.<right_expr>, s)
else Me(<binary_expr>.<left_expr>, s) *
Me(<binary_expr>.<right_expr>, s)
...

Assignment Statements
• Maps state sets to state sets U {error}

• It is evaluation and setting the target variable to the expression’s
value. Here we are working on state to state.
Ma(x := E, s) =
if Me(E, s) == error
then error
else s’ = {<i1,v1’>,<i2,v2’>,...,<in,vn’>},
where for j = 1, 2, ..., n,
if ij == x
then vj’ = Me(E, s)
else vj’ = VARMAP(ij, s)

Logical Pretest Loops
• Maps state sets to state sets U {error}

• Assume there are two other existeing mapping functions
(Msland Mb)
Ml(while B do L, s) =
if Mb(B, s) == undef
then error
else if Mb(B, s) == false
then s
else if Msl(L, s) == error
then error
else Ml(while B do L, Msl(L, s))

Loop Meaning
• The meaning of the loop is the value of the

program variables after the statements in the loop
have been executed the prescribed number of
times, assuming there have been no errors
• In essence, the loop has been converted from
iteration to recursion, where the recursive control
is mathematically defined by other recursive state
mapping functions
• This loop just like actual program loops, may
compute nothing because of nontermination
(infinite loop)
• Recursion, when compared to iteration, is easier
to describe with mathematical rigor
Evaluation of Denotational Semantics
• Can be used to prove the correctness of programs

• Provides a rigorous way to think about programs
• Can be an aid to language design
• Has been used in compiler generation systems
• Because of its complexity, it is of little use to language
users

Axiomatic Semantics
• Based on formal logic (predicate calculus)

• Original purpose: formal program verification
• Axioms or inference rules are defined for each statement
type in the language (to allow transformations of logic
expressions into more formal logic expressions)
• The logic expressions are called assertions

Axiomatic Semantics (continued)
• An assertion before a statement (a
precondition) states the relationships and
constraints among variables that are true at
that point in execution
• An assertion following a statement is a
postcondition
• A weakest precondition is the least
restrictive precondition that will guarantee
the postcondition

Axiomatic Semantics Form
• Pre-, post form: {P} statement {Q}
• An example
– a = b + 1 {a > 1}
– One possible precondition: {b > 10}
– Weakest precondition: {b > 0}
– Can’t find a weaker precondition since the statement
will not remain true

Inference Rule
• A method of inferring the truth of one assertion on the

basis of the value of other assertions.
• It has the general form S1, S2, ….., Sn/ S
• This means that if S1, S2, ……., Sn are all true than the
truth of S is inferred.
• The top of is called antecedent.
• The bottom is called consequent.
• An axiom is assumed to be true, thus it has no antecedent.

Assignment Statements
• The precondition and postcondition of an assignment

statement together define its meaning.
• To define the meaning of an assignment statement there
must be a way to compute its precondition from it
postcondition.
• Let x = E be a general assignment statement and Q be its
postcondition.
• Then, its weakest precondition p, is defined by the axiom P
= Q x => E

Assignment Statements (cont.)
• This means that P is computed as Q with all instances of x

replaced by E.
• For example, if we have the assignment statement and
postcondition a = b / 2 – 1 {a > 10}
• The weakest precondition by substituting b / 2 -1 for a in {
a > 10}.
• b / 2 -1 < 10
• b < 22
• Thus, the weakest precondition is {b < 22}

Program Proof Process
• The postcondition for the entire program is

the desired result
– Work back through the program to the first
statement. If the precondition on the first
statement is the same as the program
specification, the program is correct.

Axiomatic Semantics: Assignment
• An axiom for assignment statements

(x = E): {Qx->E} x = E {Q}
• The Rule of Consequence:

{P} S {Q}, P'  P, Q  Q'
{P' } S {Q' }

The Rule of Consequence
• {P} S {Q}, P’=> P, Q => Q’ / {P’} S {Q’}

• Where S is a program statement.
• This means that if {P} S {Q} is true
• The assertion P impels P’
• The assertion Q impels Q’
• Then it can be inferred that {P’} S {Q’}
• In other words, a postcondition can always be weakened
and a precondition can always be strengthened.
• This is quite useful in program proofs.
Axiomatic Semantics: Sequences
• The weakest precondition for a sequence of

statements cannot be described by an axiom.
• An inference rule for sequences of the form
S1; S2
{P1} S1 {P2}
{P2} S2 {P3}
{P1} S1{P2}, {P2} S2 {P3}
{P1} S1; S2 {P3}

Axiomatic Semantics: Selection
• An inference rules for selection

- if B then S1 else S2
{B and P} S1 {Q}, {(not B) and P} S2 {Q}

{P} if B then S1 else S2 {Q}

Axiomatic Semantics: Loops
• An inference rule for logical pretest loops
{P} while B do S end {Q}
(I and B) S {I}
{I} while B do S {I and (not B)}
where I is the loop invariant (the inductive
hypothesis)
The weakest precondition for the while loop must
guarantee the truth of the invariant
The truth of the invariant must not change by the
evolution of the loop controlling Boolean and the
loop body
Axiomatic Semantics: Axioms
• Characteristics of the loop invariant: I must
meet the following conditions:
– P => I -- the loop invariant must be true initially
– {I} B {I} -- evaluation of the Boolean must not change the validity of I
– {I and B} S {I} -- I is not changed by executing the body of the loop
– (I and (not B)) => Q -- if I is true and B is false, Q is implied
– The loop terminates -- can be difficult to prove

Loop Invariant
• The loop invariant I is a weakened version of the loop

postcondition, and it is also a precondition.
• I must be weak enough to be satisfied prior to the
beginning of the loop, but when combined with the loop
exit condition, it must be strong enough to force the truth
of the postcondition

Evaluation of Axiomatic Semantics
• Developing axioms or inference rules for all of the

statements in a language is difficult
• It is a good tool for correctness proofs, and an excellent
framework for reasoning about programs, but it is not as
useful for language users and compiler writers
• Its usefulness in describing the meaning of a programming
language is limited for language users or compiler writers

Denotation Semantics vs Operational
Semantics
• In operational semantics, the state changes
are defined by coded algorithms
• In denotational semantics, the state
changes are defined by rigorous
mathematical functions

Summary
• BNF and context-free grammars are equivalent meta-

languages
– Well-suited for describing the syntax of programming languages
• An attribute grammar is a descriptive formalism that can
describe both the syntax and the semantics of a language
• Three primary methods of semantics description
– Operation, axiomatic, denotational

Ch03 Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch03 Slides

Uploaded by

Copyright:

Available Formats

Chapter 3

Describing Syntax and

Copyright © 2012 Addison-Wesley. All rights reserved. 1-2

Copyright © 2012 Pearson Education. All rights reserved. 1-3

• A programming language’s success relies on the description of it. It must

• One of the main problems that's involved in describing a language, is how

• Syntax and semantics is what makes a programming language. They

Copyright © 2012 Pearson Education. All rights reserved. 1-4

• Syntax of programming languages is the form of its expressions,

Copyright © 2012 Pearson Education. All rights reserved. 1-5

Copyright © 2012 Pearson Education. All rights reserved. 1-6

• Semantics is the meaning of those expressions, statements, and

Copyright © 2012 Pearson Education. All rights reserved. 1-7

• As we saw from our previous example, syntax and semantics are

• Is it easier to create syntax than semantics?

Copyright © 2012 Pearson Education. All rights reserved. 1-8

Copyright © 2012 Pearson Education. All rights reserved. 1-9

• A sentence (or statement) is a string of characters

• A language is a set of sentences

• A lexeme is the lowest level syntactic unit of a

Copyright © 2012 Addison-Wesley. All rights reserved. 1-10

• A token is a category of lexemes (e.g.,

– A token can have a single lexeme. Just like

Copyright © 2012 Addison-Wesley. All rights reserved. 1-11

• Consider the following Java Statement:

Copyright © 2012 Pearson Education. All rights reserved. 1-12

• Languages can be defined in two ways,

– Example: The syntax analysis part of a compiler is a

– It can be determined whether the syntax of a particular

• Recognition and generation are useful for different

Copyright © 2012 Pearson Education. All rights reserved. 1-14

Copyright © 2012 Pearson Education. All rights reserved. 1-15

• Developed by linguist Noam Chomsky in the mid-1950s

• Defined four classes of generative devices or grammars.

Copyright © 2012 Pearson Education. All rights reserved. 1-16

• John Backus introduces a new formal notation for

• BNF was almost identical to context-free grammar.

• Metalanguage: a language that is used to describe another language

• BNF uses abstractions for syntactic structures, which act like

Copyright © 2012 Pearson Education. All rights reserved. 1-18

<assign> → <var> = <expression>

– This example’s rule says that abstraction <assign> is an

• Grammar: is a finite non-empty set of rules.

• Example of multiple definitions written as a single rule.

or with the rule (combining (1) and (2)):

• In these rules, <stmt> is either a single or a compound statement.

Copyright © 2012 Pearson Education. All rights reserved. 1-21

• Particularly, it can describe:

Copyright © 2012 Pearson Education. All rights reserved. 1-22

Copyright © 2012 Pearson Education. All rights reserved. 1-23

• In order to explain lists such as 1,2, . . . We must use an alternative

A grammar is a generative device for defining

beginning with a special nonterminal

The sentences of the language are generated

the start symbol represents a complete program

Copyright © 2012 Pearson Education. All rights reserved. 1-26

• The two rules are equivalent

Copyright © 2012 Pearson Education. All rights reserved. 1-27

opright © 2012 Pearson Education. All rights reserved. 1-28

• Every string of symbols in the derivation, including

• A leftmost derivation is one in which the leftmost

• The derivation continues until the sentential form

• By choosing alternative RHSs of rules with which to replace nonterminals in

• By exhaustively choosing all combinations of choices, the entire language