Download as pdf or txt
Download as pdf or txt
You are on page 1of 107

lOMoARcPSD|35484261

PPL U2 - The abstract data types in object-oriented


languages, are usually called classes.
Principles of Programming Language (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)
lOMoARcPSD|35484261

2.1 NAMES

Introduction

Within programming a variety of items are given descriptive names to make the code
more meaningful to us as humans. These names are called “Identifier Names”. Constants,
variables, type definitions, functions, etc. when declared or defined are identified by a name.
These names follow a set of rules that are imposed by:

 The language’s technical limitations


 good programming practices
 common industry standards for the language

Imperative programming languages are abstraction of the underlying von Neumann computer
architecture.

 Memory : stores both instructions and data


 Processor: Provides operations for modifying the contents of the memory

Abstraction for memory is called as variables. Sometimes abstraction is very close to


characteristics of cells.

e.g. Integer – represented directly in one or more bytes of a memory

In other cases, abstraction is far from the organization of memory.

e.g. Three dimensional array. Requires software mapping function to support the
abstraction

A variable is characterized by a collection of properties (attributes)

 Name
 Address
 Value
 Type
 Scope
 Lifetime

Variables, subprograms, labels, user defined types, formal parameters have names.
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Design Issues:

 Maximum length of a name


 Case sensitive or case insensitive
 Special words (reserved words, keywords or predefined names)

Name forms:

 Name is a string of characters


 Every language has a different string size.
 Earliest languages : single character
 Fortran 77 : up to 6 characters
 Fortran 95 : 31 characters
 C89 : no limit but only first 31 are significant
 Java, C#, Ada : no limit
 C++ : no limit , but sometimes implementors have

• Names in most PL have the same form:

A letter followed by a string consisting of letters, digits, and underscore characters

• Today “camel” notation is more popular for C-based languages (e.g. myStack)

• In early versions of Fortran – embedded spaces were ignored. e.g. following two names are
equivalent

Sum Of Salaries

SumOfSalaries

Case sensitivity:

In many languages (e.g. C-based languages) uppercase and lowercase letters in names are
distinct

e.g. rose, ROSE, Rose

• Problem for readability – names look very similar denote different entities

• Also bad for writability since programmer has to remember the correct cases

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

e.g. Java method parseInt for converting a string into integer, not ParseInt or parseint

• In C the problem can be avoided by exclusive use of lowercase letters for names

• In Java and C#, many of the predefined names include both uppercase and lowercase letters, so
the problem cannot be escaped

• In Fortran 90, lowercase letters are allowed, and they simply translated to uppercase letters

Special words:

 reserved words (special words that cannot be used as names):


 Can’t define for or while as function or variable names.
 Good design choice

Keywords (special only in certain context):

• In FORTRAN, if REAL is in the beginning of a statement and followed by a name, it is


considered as a keyword for declaration.

Examples

REAL APPLE declaration

REAL = 8.7 assignment

or

INTEGER REAL

REAL INTEGER

This is allowed but not readable.

• Special words and names are distinguished by content

Special words :

A keyword is a word of a programming language that is special only in certain contexts. Fortran
is the only remaining widely used language whose special words are keywords. In Fortran, the
word Integer, when found at the beginning of a statement and followed by a name, is considered
a keyword that indicates the statement is a declarative statement. However, if the word Integer is

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

followed by the assignment operator, it is considered a variable name. These two uses are
illustrated in the following:

Integer Apple

Integer = 4

Predefined names (have predefined meanings, but can be redefined by the user):

For example, built-in data type names in Pascal, such as INTEGER, normal input/output
subprogram names, such as readln, writeln, are predefined.

2.2 VARIABLES

In programming, a variable is a value that can change, depending on conditions or on


information passed to the program. Typically, a program consists of instruction s that tell the
computer what to do and data that the program uses when it is running. The data consists of
constants or fixed values that never change and variable values (which are usually initialized to
"0" or some default value because the actual values will be supplied by a program's user).
Usually, both constants and variables are defined as certain data types. Each data type prescribes
and limits the form of the data. Examples of data types include: an integer expressed as a decimal
number, or a string of text characters, usually limited in length.

In object-oriented programming, each object contains the data variables of the class it is
an instance of. The object's method s are designed to handle the actual values that are supplied to
the object when the object is being used. Abstraction of a computer memory cell or collection of
cells

 It is not just a name for a memory location


 It has six attributes: name, address, value, type, lifetime, Scope

1. Name

• Most variables are named (often referred as identifiers).

• Although nameless variables do exist (e.g. pointed variables).

2. Address

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Associated memory location

• It is possible that the same name refer to different locations

1. in different parts of a program.

– A program can have two subprograms sub1 and sub2 each of defines a local variable
that use the same name.

e.g. sum

2. in different times.

– For a variable declared in a recursive procedure, in different steps of recursion it refers to


different locations.

Aliases

 Multiple identifiers reference the same address – more than one variable are used to
access the same memory location
 Such identifier names are called aliases.
 Aliases are not good for readability because the value of a variable can be changed by an
assignment to its another name.
 can be created explicitly
 by EQUIVALENCE statement in FORTRAN
 by union types in C and C++
 by variant record in Pascal
 by subprogram parameters
 by pointer variables

3. Type

• Determines the range of values the variable can take, and the set of operators that are defined
for values of this type.

• For example int type in Java specifies a range of -2147483648 to 2147483647

4. Value

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

 Contents of the memory cell or cells associated with the variable


 (abstract memory cell instead of byte size memory cells)
 l_value ← r_value (assingment operation)
 l_value of a variable: address of the variable
 r_value of a variable: value of the variable

2.3 BINDING

A source file has many names whose properties need to be determined. The meaning of
these properties might be determined at different phases of the life cycle of a program. Examples
of such properties include the set of values associated with a type; the type of a variable; the
memory location of the compiled function; the value stored in a variable, and so forth. Binding
is the act of associating properties with names. Binding time is the moment in the
program's life cycle when this association occurs.

Many properties of a programming language are defined during its creation. For instance,
the meaning of key words such as while or for in C, or the size of the integer data type in Java,
are properties defined at language design time. Another important binding phase is the language
implementation time. The size of integers in C, contrary to Java, were not defined when C was
designed. This information is determined by the implementation of the compiler. Therefore, we
say that the size of integers in C is determined at the language implementation time.

Many properties of a program are determined at compilation time. Among these


properties, the most important are the types of the variables in statically typed languages.
Whenever we annotate a variable as an integer in C or Java, or whenever the compiler infers that
a variable in Haskell or SML has the integer data type, this information is henceforward used to
generate the code related to that variable. The location of statically allocated variables, the layout
of the [activation records] of function and the control flow graph of statically compiled programs
are other properties defined at compilation time.

If a program uses external libraries, then the address of the external functions will be
known only at link time. It is in this moment that the runtime environment finds where is located
the printf function that a C program calls, for instance. However, the absolute addresses used in

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

the program will only be known at loading time. At that moment we will have an image of the
executable program in memory, and all the dependences will have been already solved by the
loader.

Finally, there are properties which we will only know once the program executes. The
actual values stored in the variables is perhaps the most important of these properties. In
dynamically typed languages we will only know the types of variables during the execution of
the program. Languages that provide some form of late binding will only lets us know the target
of a function call at runtime, for instance

In general binding is the association of attribute to its entity or operation to its symbol

Time of binding is called as binding time (important in the semantics of PL’s)

Binding times

1. Language design time

 is bound to the multiplication operation,


 pi=3.14159 in most PL’s.

2. Language implementation time

 A data type, such as int in C is bound to a range of possible values

3. Compile time

 A Java variable is bound to its type.

4. Link time

 A call to the library subprogram is bound to the subprogram code.

5. Load time

 A variable is bound to a specific memory location.

6. Run time

 A variable is bound to a value through an assignment statement.


 A local variable of a Pascal procedure is bound to a memory location.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Example:

count = count + 5

• The type of count is bound at compile time

• The set of possible values of count is bound at compiler design time

• The meaning of the operator symbol + is bound at compile time, when the types of its operands
have been determined

• The internal representation of the literal 5 is bound at compiler design time

• The value of count is bound at execution times with this Statement

Binding of attributes to variables

1. Static: if binding occurs before runtime and remains unchanged throughout the program
execution.

2. Dynamic: if binding occurs during runtime or can change in the course of program execution

Type bindings

Before a variable can be referenced in a program it must be bound to a data type. It is


important to identify how the type is specified and when the binding takes place

Variable declarations

1. Explicit declaration (by statement)

It is a statement in a program that lists variable names and specifies that they are a
particular type

2. Implicit declaration (by first appearance)

It means of associating variables with types through default conventions, rather than
declaration statements. First appearance of a variable name in a program constitutes its implicit
declaration

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

 Both Declarations creates static binding to types. Most current PLs require explicit
declarations of all variables, Exceptions are Perl, Javascript, ML Languages.
 Early languages (Fortran, BASIC) have implicit declarations
 e.g. In Fortran, if not explicitly declared, an identifier starting with I,J,K,L,M,N are
implicitly declared to integer, otherwise to real type
 Implicit declarations are not good for reliability and writability because misspelled
identifier names cannot be detected by the compiler
 e.g. In Fortran variables that are accidentally left undeclared are given default types, and
leads to errors that are difficult to diagnose
 Some problems of implicit declarations can be avoided by requiring names for specific
types to begin with a particular special characters
• e.g. In Perl

$apple : scalar

@apple: array

%apple: hash

For example in C#, Consider the following declarations:

var sum = 0;

var total = 0.0;

var name = "Fred";

The types of sum, total, and name are int, float, and string, respectively

Dynamic type binding

Type of a variable is not specified by a declaration statement, nor can it be determined by the
spelling of its name

• Type is bound when it is assigned a value by an assignment statement.

• Advantage: Allows programming flexibility.

example languages: Javascript and PHP

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• e.g. In JavaScript

list = [10.2 5.1 0.0]

list is a single dimensioned array of length 3.

List = 73

list is a simple integer

Disadvantage:

1. less reliable : compiler cannot check and enforce types.

Example:

Suppose I and X are integer variables, and Y is a floating-point.

The correct statement is

I := X

But by a typing error

I := Y

is typed. In a dynamic type binding language, this error cannot be detected by the
compiler. I is changed to float during execution. The value of I becomes erroneous.

Disadvantage:

1. Cost:

 Type checking must be done at run-time.


 Every variable must have a descriptor to maintain current type.
 The correct code for evaluating an expression must be determined during execution.
 Languages that use dynamic type bindings are usually implemented as interpreters (LISP
is such a language).

Type Inference

ML is a PL that supports both functional and imperative programming.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In ML, the type of an expression and a variable can be determined by the type of a constant in
the expression without requiring the programmer to specify the types of the variables

Examples

fun circum (r) = 3.14 *r*r; (circum is real)

fun times10 (x) = 10*x; (times10 is integer)

• Note: fun is for function declaration

fun square (x) = x*x;

– Default is int. if called with square (2.75) it would cause an error

It could be rewritten as:

fun square (x: real) = x*x;

fun square (x):real = x*x;

fun square (x) = (x:real)*x;

fun square (x) = x*(x:real);

Storage Bindings and Lifetime

Allocation: process of taking the memory cell to which a variable is bound from a pool of
available memory

Deallocation: process of placing the memory cell that has been unbound from a variable back
into the pool of available memory

Lifetime of a variable: Time during the variable is bound to a specific memory location

According to their lifetimes, variables can be separated into four categories:

 static,
 stack-dynamic,
 explicit heap-dynamic,

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

 implicit dynamic.

Static Variables

 Static variables are bound to memory cells before execution begins, and remains bound to
the same memory cells until execution terminates.
 Applications: globally accessible variables, to make some variables of subprograms to
retain values between separate execution of the subprogram
 Such variables are history sensitive.

Advantage: Efficiency. Direct addressing (no run-time overhead for allocation and
deallocation).

Disadvantage: Reduced flexibility.

• If a PL has only static variables, it cannot support recursion.

• Examples: All variables in FORTRAN I, II, and IV

• Static variables in C, C++ and Java

Stack-Dynamic Variables

 Storage binding: when declaration statement is elaborated (in run-time).


 Type binding: statical.
 Example: A Pascal procedure consists of a declaration section and a code section.The
local variables get their type binding statically at compile time, but their storage binding
takes place when that procedure is called. Storage is deallocated when the procedure
returns.,
 Local variables in C functions.
 Advantages: Dynamic storage allocation is needed for recursion. Same memory cells can
be used for different variables (efficiency)
 Disadvantages: Runtime overhead for allocation and deallocation
 In C and C++, local variables are, by default, stack-dynamic, but can be made static
through static qualifier.

foo ()
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

static int x; …}

Explicit Heap-Dynamic Variables

 Nameless variables
 Storage allocated/deallocated by explicit run-time instructions
 can be referenced only through pointer variables
 types can be determined at run-time
 storage is allocated when created explicitly
 Advantages: Required for dynamic structures (e.g., linked lists, trees)
 Disadvantages: Difficult to use correctly, costly to refer, allocate, deallocate.

As an example of explicit heap-dynamic variables, consider the following

C++ code segment:

int *intnode; // Create a pointer

intnode = new int; // Create the heap-dynamic variable

...

delete intnode; // Deallocate the heap-dynamic variable

// to which intnode points

In this example, an explicit heap-dynamic variable of int type is created by the new operator.
This variable can then be referenced through the pointer, intnode. Later, the variable is
deallocated by the delete operator. C++ requires the explicit deallocation operator delete, because
it does not use

implicit storage reclamation, such as garbage collection.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

 In Java, all data except the primitive scalars are objects. Java objects are explicitly heap
dynamic and are accessed through reference variables. Java has no way of explicitly
destroying a heap-dynamic variable; rather, implicit garbage collection is used.
 C# has both explicit heap-dynamic and stack-dynamic objects, all of which are implicitly
deallocated. In addition, C# supports C++-style pointers. Such pointers are used to
reference heap, stack, and even static variables and objects.

Implicit Heap-Dynamic Variables

• Storage and type bindings are done when they are assigned values.

For example, consider the following JavaScript assignment statement:

highs = [74, 84, 86, 90, 71];

Regardless of whether the variable named highs was previously used in the program or what it
was used for, it is now an array of five numeric values.• Advantages: Highest degree of
flexibility

• Disadvantages:

• Runtime overhead for allocation and deallocation

• Loss of error detection by compiler

• Examples: Javascript and APL variables.

2. 4 TYPE CHECKING

Type checking is the activity of ensuring that the operands of an operator are of
compatible types. A compatible type is one that either is legal for the operator or is allowed
under language rules to be implicitly converted by compiler-generated code (or the interpreter)
to a legal type. This automatic conversion is called a coercion. For example, if an int variable
and a float variable are added in Java, the value of the int variable is coerced to float and a
floating-point add is done.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

A type error is the application of an operator to an operand of an inappropriate type. For


example, in the original version of C, if an int value was passed to a function that expected a
float value, a type error would occur E.g. In addition of int variable with a float variable in Java
int variable is coerced into float and floating point addition is done

 If type binding is static then all type checking can be done statically by compiler.
 Dynamic type binding requires dynamic type checking at run time, e.g. Javascript and
PHP
 It is better to detect errors at compile time than at run time because the earlier correction
is usually less costly
 However, static checking reduces flexibility
 If a memory cell stores values of different types (Ada variant records, Fortran
Equivalance, C and C++ unions) then type checking must be done dynamically at run
time.
 So, even though all variables are statically bound to types in languages such as C++, not
all type errors can be detected by static type checking.

Strong typing

A Program Language is a strongly typed language if – each name has a single type, and – type is
known at compile-time.

That is, all types are statically bound.

A better definition:

A Program Language is strongly typed if type errors are always detected (compile time or
run time).

It allows functions for which parameters are not type checked.

Examples:

• FORTRAN77 is not strongly typed because

– Relationship between actual and formal parameters are not type checked.
- EQUIVALANCE can be declared between different typed names.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• PASCAL is nearly strongly typed

– except variant records because they allow omission of the tag field

• Modula-2 is not strongly typed because of variant records.

• Ada is nearly strongly typed

– Variant records are handled better than PASCAL and Modula-2

• C, ANSI C, C++ are not strongly typed

– allow functions for which parameters are not type checked.

Coercion weakens the value of strong typing

Example:

In Java the value of an integer operand is coerced to floating point and a floating
operation takes place

• Assume that a and b are int variables. User intended to type a+b but mistakenly typed a + d
where d is a float value. Then the error would not be detected since a would be coerced into
float.

Type compatibility

The most important result of two variables being compatible types is that either one can have its
value assigned to the other

• Two methods for checking type compatibility:

 Name Type Compatibility


 Structure Type Compatibility

Name Type Compatibility:

– Two variables have compatible types only if they are in either the same declaration or in
declarations that use the same type name.

• Adv: Easy to implement

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Disadv: highly restrictive

Under a strict interpretation a variable whose type is a subrange of the integers would not be
compatible with an integer type variable

Example:

type indexType = 1..10; {subrange type}

var count: integer;

index: indexType;

• The variables count and index are not name type compatible, and cannot be assigned to each
other

• Another problem arises when a structured type is passed among subprograms through
parameters

• Such a type must be defined once globally

• A subprogram cannot state the type of such formal parameters in local terms (e.g. In Pascal)

Structure Type Compatibility:

• Two variables have compatible types if their types have identical structure.

• Disadv: Difficult to implement

• Adv: more flexible

• The variables count and index in the previous example, are structure type compatible.

• Under name type compatibility only the two type names must be compared

• Under structure compatibility entire structures of the two types must be compared

• For structures that refer to its own type (e.g. linked lists) this comparison is difficult

• Also it is difficult to compare two structures, because

– They may have different field names

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– There may be arrays with different ranges

– There may be enumeration types

• It also disallows differentiating between types with the same structure

type celsius = float;

fahrenheit = float;

• They are compatible according to structure type compatibility but they may be mixed

 Most PL’s use a combination of these methods.


 C uses structural equivalence for all types except structures.
 C++ uses name equivalence

Type compatibility (Ada)

 Ada uses name compatibility


 But also provides two type constructs

– Subtypes

– Derived types

• Derived types : a new type based on some previously defined type with which it is
incompatible. They inherit all the properties of the parent type

• type celsius is new float

• type fahrenheit is new float

• Thee two types are incompatible, although their structures are identical

• They are also incompatible with any other floating point Type

• Subtype: possibly range constrained version of an existing type. A subtype is compatible with
parent type

• Subtype small_type is Integer range 0..99;

• Variales of small_type are compatible with integer variables

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

For unconstrained array types structure type compatibility is used

• Type vector is array (Integer range<>) of integer

• Vector 1: vector(1..10)

• Vector 2:vector(11..20)

• These two objects are compatible even though they have different names and different
subscript ranges

• Because for objects of unconstrained array types structure compatibility is used

• Both types are of type integer, and they both have then elements, therefore they are compatible

• For constrained anonymous arrays

A: array(1..10) of integer;

B: array (1..10) of integer

A and B are incompatible

C,D: array(1..10) of integer

C and D are incompatible

Type list_10 is array(1..10) of integer

C,D:list_10;

C and D are compatible

Type compatibility in C

• C uses structure type compatibility for all types except structures and unions

• Every struct and union declaration creates a new type which is not compatible with any other
type

• Note that typedef does not introduce any new type but it defines a new name

• C++ uses name equivalence

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

2.5 SCOPE

Scope of a variable is the range of statements in which the variable is visible. A variable
is visible in a statement if it can be referenced in that statement.

• The scope rules of a language determine how references to names are associated with variables

Static Scope :

Scope of variables can be determined statically

– by looking at the program

– prior to execution

• First defined in ALGOL 60.

• Based on program text

• To connect a name reference to a variable, you (or the compiler) must find the declaration

Search process:

– search declarations,

•first locally,

•then in increasingly larger enclosing scopes,

•until one is found for the given name

In all static-scoped languages (except C), procedures are nested inside the main program.

• Some languages also allow nested subprograms

– Ada, Javascript, PHP - do

– C based languages – do not

• In this case all procedures and the main unit create their scopes.

Enclosing static scopes (to a specific scope) are called its static ancestors;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• The nearest static ancestor is called a static parent

main is the static parent of p2 and p1 p2 P2 is the static parent of P1

Procedure Big is
x : integer
procedure sub1 is
begin – of sub1
.... x ....
end – of sub1
procedure sub2 is
x: integer;
begin – of sub2
....
end – of sub2
begin – of big
...
end – of big

The reference to variable x in sub1 is to the x declared in procedure Big


x in Big is hidded from sub2 because there is another x in sub2

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In some languages that use static scoping, regardless of whether nested subprograms are
allowed, some variable declarations can be hidden from some other code segments

e.g. In C++
void sub1() {
int count;
...
while (...) {
int count;
...
}
...
}
• The reference to count in while loop is local

• Count of sub is hidden from the code inside the while loop

Variables can be hidden from a unit by having a "closer" variable with the same name

• C++ and Ada allow access to these "hidden" variables

– In Ada: unit.name

– In C++: class_name::name

Blocks

Some languages allow new static scopes to be defined without a name.

• It allows a section of code its own local variables whose scope is minimized.

• Such a section of code is called a block

• The variables are typically stack dynamic so they have their storage allocated when the section
is entered and deallocated when the section is exited

• Blocks are first introduced in Algol 60

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In Ada,

...
declare TEMP: integer;
begin
TEMP := FIRST;
FISRT := SECOND; Block
SECOND := TEMP;
end;
...
C and C++ allow blocks.

int first, second;


...
first = 3; second = 5;
{ int temp;
temp = first;
first = second;
second = temp;
}
...
temp is undefined here.

 C++ allows variable definitions to appear anywhere in functions. The scope is from the
definition statement to the end of the function
 In C, all data declarations (except the ones for blocks) must appear at the beginning of the
function
 for statements in C++,Java and C# allow variable definitions in their initialization
expression. The scope is restricted to the for construct

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Global Scope

Some languages, including C, C++, PHP, JavaScript, and Python, allow a program structure that
is a sequence of function definitions, in which variable definitions can appear outside the
functions. Definitions outside functions in a file create global variables, which potentially can be
visible to those functions.

C and C++ have both declarations and definitions of global data. Declarations specify types and
other attributes but do not cause allocation of storage.

Definitions specify attributes and cause storage allocation

. A global variable that is defined after a function can be made visible in the function by
declaring it to be external, as in the following:

extern int sum;

Consider the following example: in PHP

$day = "Monday";
$month = "January";
function calendar() {
$day = "Tuesday";
global $month;
print "local day is $day <br />";
$gday = $GLOBALS['day'];
print "global day is $gday <br \>";
print "global month is $month <br />";
}
calendar();

Interpretation of this code produces the following:

local day is Tuesday

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

global day is Monday


global month is January
The global variables of JavaScript are very similar to those of PHP, except that there is no way to
access a global variable in a function that has declared a local variable with the same name

Dynamic scope

APL, SNOBOL4, early dialects of LISP use dynamic scoping.

• COMMON LISP and Perl also allows dynamic scope but also uses static scoping

• In dynamic scoping

– scope is based on the calling sequence of subprograms

– not on the spatial relationships

– scope is determined at run-time.

When the search of a local declaration fails, the declarations of the dynamic parent is searched

• Dynamic parent is the calling procedure

Procedure Big is
x : integer
procedure sub1 is
begin – of sub1
.... x ....
end – of sub1
procedure sub2 is
x: integer;
begin – of sub2
....
end – of sub2
begin – of big
...

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

end – of big

Big calls sub2 sub1 calls sub1

Dynamic parent of sub1 is sub2 sub2 is Big

Scope and Lifetime:

Sometimes the scope and lifetime of a variable appear to be related. For example, consider a
variable that is declared in a Java method that contains no method calls. The scope of such a
variable is from its declaration to the end of the method. The lifetime of that variable is the
period of time beginning when the method is entered and ending when execution of the method
terminates Scope and lifetime are also unrelated when subprogram calls are involved.

Consider the following C++ functions:

void printheader() {

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

...

} /* end of printheader */

void compute() {

int sum;

...

printheader();

} /* end of compute */

The scope of the variable sum is completely contained within the compute function. It does not
extend to the body of the function printheader, although printheader executes in the midst of the
execution of compute. However, the lifetime of sum extends over the time during which
printheader executes.

Whatever storage location sum is bound to before the call to printheader, that binding will
continue during and after the execution of printheader.

Referencing environments

The referencing environment of a statement is the collection of all names that are visible
in the statement

• In a static-scoped language, it is the local variables plus all of the visible variables in all of the
enclosing scopes

• A subprogram is active if its execution has begun but has not yet terminated

• In a dynamic-scoped language, the referencing environment is the local variables plus all
visible variables in all active subprograms

Consider the following example program. Assume that the only function

calls are the following: main calls sub2, which calls sub1.

void sub1() {
int a, b;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

...1
} /* end of sub1 */
void sub2() {
int b, c;
.. . . 2
sub1();
} /* end of sub2 */
void main() {
int c, d;
...3
sub2();
} /* end of main */
The referencing environments of the indicated program points are as follows:

Point Referencing Environment

1 a and b of sub1, c of sub2, d of main, (c of main and b of sub2 are hidden)

2 b and c of sub2, d of main, (c of main is hidden)

3 c and d of main

2.6 PRIMITIVE DATATYPES

A data type defines a collection of data objects and a set of predefined operations on those
objects

• A descriptor is the collection of the attributes of a variable

• An object represents an instance of a user defined (abstract data) type

• One design issue for all data types: What operations are defined and how are they
specified?

Primitive Data Types

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Almost all programming languages provide a set of primitive data types

• Primitive data types: Those not defined in terms of other data types

• Some primitive data types are merely reflections of the hardware

• Others require only a little non-hardware support for their implementation

The Integer Data Type

Almost always an exact reflection of the hardware so the mapping is trivial

• There may be as many as eight different integer types in a language

• Java’s signed integer sizes: byte, short,int, long

The Floating Point Data Type

Model real numbers, but only as approximations

Languages for scientific use support at least two floating-point types (e.g., float and double;
sometimes more

• Usually exactly like the hardware, but not always

• IEEE Floating-Point Standard 754

Complex Data Type

• Some languages support a complex type, e.g., C99, Fortran, and Python

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Each value consists of two floats, the real part and the imaginary part

The Decimal Data Type

Most larger computers that are designed to support business systems applications have
hardware support for decimal data types. Decimal data types store a fixed number of decimal
digits, with the decimal point at a fixed position in the value. These are the primary data types for
business data processing and are therefore essential to COBOL. C# and F# also have decimal
data types

Decimal types are stored very much like character strings, using binary codes for the decimal
digits. These representations are called binary coded decimal (BCD).

The Boolean DataType

 Simplest of all
 Range of values: two elements, one for true and one for false
 Could be implemented as bits, but often as bytes .In expressions, all operands with
nonzero values are considered true, and zero is considered false
 Boolean types are often used to represent switches or flags in programs
 Advantage: readability

The Character Data Type

• Character data are stored as numeric codings

• Most commonly used coding: ASCII

• An alternative, 16-bit coding: Unicode (UCS-2)

– Includes characters from most natural languages

– Originally used in Java

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– C# and JavaScript also support Unicode

• 32-bit Unicode (UCS-4)

– Supported by Fortran, starting with 2003

Character String Types

 A character string type is one in which the values consist of sequences of characters.
character string constants are used to label output, and the input and output of all kinds of
data are often done in terms of strings. Of course, character strings also are an essential
type for all programs that do character manipulation.

Design issues:

 Is it a primitive type or just a special kind of array?


 Should the length of strings be static or dynamic?

Character String Types Operations

Typical operations:

– Assignment and copying

– Comparison (=, >, etc.)

– Catenation

– A substring reference is a reference to a substring of a given string. Substring


references are discussed in the more general context of arrays, where the substring references are
called slices.

– Pattern matching

If strings are not defined as a primitive type, string data is usually stored in arrays of
single characters and referenced as such in the language. This is the approach taken by C and
C++. C and C++ use char arrays to store character strings. These languages provide a collection
of string operations through standard libraries. Many uses of strings and many of the library
functions use the convention that character strings are terminated with a special character, null,
which is represented with zero
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

The character string literals that are built by the compiler also have the null character. For
example, consider the following declaration:

char str[] = "apples";

In this example, str is an array of char elements, specifically apples0, where 0 is the null
character.

Some of the most commonly used library functions for character strings in C and C++ are strcpy,
which moves strings; strcat, which concatenates one given string onto another; strcmp, which
lexicographically compares (by the order of their character codes) two given strings; and strlen,
which returns the number of characters, not counting the null, in the given string. The parameters
and return values for most of the string manipulation functions are char pointers that point to
arrays of char. Parameters can also be string literals. The string manipulation functions of the C
standard library, which are also available in C++, are inherently unsafe and have led to numerous
programming errors.

1. C and C++

– Not primitive

– Use char arrays and a library of functions that provide operations

2. SNOBOL4 (a string manipulation language)

– Primitive

– Many operations, including elaborate pattern Matching

3. Fortran and Python

Python includes strings as a primitive type and has operations for substring reference,
catenation, indexing to access individual characters, as well as methods for searching and
replacement. There is also an operation for character membership in a string. So, even though
Python’s strings are primitive types, for character and substring references, they act very much
like arrays of characters. However, Python strings are immutable, similar to the String class
objects of Java.

4. Java
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In Java, strings are supported by the String class, whose values are constant strings, and the
StringBuffer class, whose values are changeable and are more like arrays of single characters.
These values are specified with methods of the StringBuffer class.

5. Perl, JavaScript, Ruby, and PHP

Perl, JavaScript, Ruby, and PHP include built-in pattern-matching operations. In these
languages, the pattern-matching expressions are somewhat loosely based on mathematical
regular expressions. In fact, they are often called regular expressions. They evolved from the
early UNIX line editor, ed, to become part of the UNIX shell languages

String Length Options

There are several design choices regarding the length of string values. First, the length
can be static and set when the string is created. Such a string is called a static length string. This
is the choice for the strings of Python, the immutable objects of Java’s String class, as well as
similar classes in the C++ standard class library, Ruby’s built-in String class, and the .NET class
library available to C# and F#.

The second option is to allow strings to have varying length up to a declared and fixed
maximum set by the variable’s definition, as exemplified by the strings in C and the C-style
strings of C++. These are called limited dynamic length strings. Such string variables can store
any number of characters between zero and the maximum

• Limited Dynamic Length: C and C++

– In these languages, a special character is used to indicate the end of a string’s


characters, rather than maintaining the length

The third option is to allow strings to have varying length with no maximum, as in
JavaScript, Perl, and the standard C++ library. These are called dynamic length strings.

• dynamic length strings (no maximum): SNOBOL4, Perl, JavaScript

• Ada supports all three string length options

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Character String Implementation

Character string types could be supported directly in hardware; but in most cases,
software is used to implement string storage, retrieval, and manipulation. When character string
types are represented as character arrays, the language often supplies few operations• Static
length: compile-time descriptor

• Limited dynamic length: may need a runtime descriptor for length (but not in C and C++)

• Dynamic length: need run-time descriptor; allocation/deallocation is the biggest


Implementation problem

Compile- and Run-Time Descriptors

USER-DEFINED ORDINAL TYPES

• An ordinal type is one in which the range of possible values can be easily associated with the
set of positive integers

• Examples of primitive ordinal types in Java

– integer

– char

– Boolean

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Enumeration Types

An enumeration type is one in which all of the possible values, which are named
constants, are provided, or enumerated, in the definition. Enumeration types provide a way of
defining and grouping collections of named constants, which are called enumeration constants.

• C# example

enum days {mon, tue, wed, thu, fri, sat, sun};

• Design issues

– Is an enumeration constant allowed to appear in more than one type definition, and if so, how
is the type of an occurrence of that constant checked?

– Are enumeration values coerced to integer?

– Any other type coerced to an enumeration type?

• Is an enumeration constant allowed to appear in more than one type definition, and if so, how is
the type of an occurrence of that constant checked?

• Are enumeration values coerced to integer?

• Any other type coerced to an enumeration type?

Designs:

In languages that do not have enumeration types, programmers usually simulate them with
integer values.

C and Pascal were the first widely used languages to include an enumeration data type. C++
includes C’s enumeration types. In C++, we could have the following:

enum colors {red, blue, green, yellow, black};

colors myColor = blue, yourColor = red;

The colors type uses the default internal values for the enumeration constants, 0, 1, . . . ,
although the constants could have been assigned any integer literal (or any constant-valued
expression). The enumeration values are coerced to int when they are put in integer context. This

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

allows their use in any numeric expression. For example, if the current value of myColor is blue,
then the expression

myColor++

would assign green to myColor.

In ML, enumeration types are defined as new types with datatype declarations. For example, we
could have the following:

datatype weekdays = Monday | Tuesday | Wednesday |Thursday | Friday

The types of the elements of weekdays is integer.

F# has enumeration types that are similar to those of ML, except the reserved word type is used
instead of datatype and the first value is preceded by an OR operator (|).

Evaluation of Enumerated Type

Enumeration types can provide advantages in both readability and reliability. Readability
is enhanced very directly: Named values are easily recognized, whereas coded values are not.–
operations (don’t allow colors to be added)

– No enumeration variable can be assigned a value outside its defined range

– Ada, C#, and Java 5.0 provide better support for enumeration than C++ because
enumeration type variables in these languages are not coerced into integer types

SUBRANGE TYPES

A subrange type is a contiguous subsequence of an ordinal type. For example, 12..14 is a


subrange of integer type. Subrange types were introduced by Pascal and are included in Ada.
There are no design issues that are specific to subrange types.

• Ada’s design

type Days is (mon, tue, wed, thu, fri, sat, sun);


subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In these examples, the restriction on the existing types is in the range of possible values. All of
the operations defined for the parent type are also defined for the subtype, except assignment of
values outside the specified range. For example, in

Day1: Days;

Day2: Weekday;

Day2 := Day1;

the assignment is legal unless the value of Day1 is Sat or Sun

Subrange Evaluation

• Aid to readability

– Make it clear to the readers that variables of subrange can store only certain range of values

• Reliability

– Assigning a value to a subrange variable that is outside the specified range is detected as an

error

Implementation of User-Defined Ordinal Types

• Enumeration types are implemented as integers

• Subrange types are implemented like the parent types with code inserted (by the compiler) to
restrict assignments to subrange variables

2.7 ARRAY TYPES

An array is a homogeneous aggregate of data elements in which an individual element is


identified by its position in the aggregate, relative to the first element. The individual data
elements of an array are of the same type. References to individual array elements are specified
using subscript expressions. If any of the subscript expressions in a reference include variables,
then the reference will require an additional run-time calculation to determine the address of the
memory location being referenced.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In many languages, such as C, C++, Java, Ada, and C#, all of the elements of an array are
required to be of the same type. In these languages, pointers and references are restricted to point
to or reference a single type. So the objects or data values being pointed to or referenced are also
of a single type. In some other languages, such as JavaScript, Python, and Ruby, variables are
typeless references to objects or data values. In these cases, arrays still consist of elements of a
single type, but the elements can reference objects or data values of different types. Such arrays
are still homogeneous, because the array elements are of the same type.

C# and Java 5.0 provide generic arrays, that is, arrays whose elements are references to
objects, through their class libraries

Array Design Issues

What types are legal for subscripts?

• Are subscripting expressions in element references range checked?

• When are subscript ranges bound?

• When does allocation take place?

• Are ragged or rectangular multidimensional arrays allowed, or both?

• What is the maximum number of subscripts?

• Can array objects be initialized?

• Are any kind of slices supported?Array Indexing

Indexing (or subscripting) is a mapping from indices to elements

array_name (index_value_list) → an element

• Index Syntax

– Fortran and Ada use parentheses

• Ada explicitly uses parentheses to show uniformity between array references and function calls
because both are mappings

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– Most other languages use brackets

Arrays Index (Subscript) Types

• FORTRAN, C: integer only

• Ada: integer or enumeration (includes Boolean and char)

For example, in Ada one could have the following:

type Week_Day_Type is (Monday, Tuesday, Wednesday, Thursday, Friday);

type Sales is array (Week_Day_Type) of Float;

• Java: integer types only

Index range checking

• C, C++, Perl, and FORTRAN do not specify range checking

• Java, ML, C# specify range checking

• In Ada, the default is to require range checking, but it can be turned off

Subscript Binding and Array Categories

There are five categories of arrays, based on the binding to subscript ranges, the binding
to storage, and from where the storage is allocated. The category names indicate the design
choices of these three. In the first four of these categories, once the subscript ranges are bound
and the storage is allocated, they remain fixed for the lifetime of the variable

• Static array: subscript ranges are statically bound and storage allocation is static (before
runtime)

– Advantage: efficiency (no dynamic allocation)

• A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but the
allocation is done at declaration elaboration time during execution.

– Advantage: space efficiency

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Stack-dynamic Array: subscript ranges are dynamically bound and the storage allocation is
dynamic (done at run-time)

– Advantage: flexibility (the size of an array need not be known until the array is to be
used)

• Fixed heap-dynamic array: similar to fixed stack dynamic: storage binding is dynamic but
fixed after allocation (i.e., binding is done when requested and storage is allocated from heap,
not stack)

• Heap-dynamic array: binding of subscript ranges and storage allocation is dynamic and can
change any number of times

– Advantage: flexibility (arrays can grow or shrink during program execution)

 C and C++ arrays that include static modifier are static


 C and C++ arrays without static modifier are fixed stack-dynamic
 C and C++ provide fixed heap-dynamic arrays
 C# includes a second array class ArrayList that provides fixed heap-dynamic
 Perl, JavaScript, Python, and Ruby support heap dynamic arrays

Ada arrays can be stack dynamic, as in the following:

Get(List_Len);
declare
List : array (1..List_Len) of Integer;
begin
...
end;
In this example, the user inputs the number of desired elements for the array List. The
elements are then dynamically allocated when execution reaches the declare block. When
execution reaches the end of the block, the List array is deallocated.

Array Initialization

• Some language allow initialization at the time of storage allocation

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– C, C++, Java, C# example


int list [] = {4, 5, 7, 83}
– Character strings in C and C++
char name [] = ″freddie″;
– Arrays of strings in C and C++
char *names [] = {″Bob″, ″Jake″, ″Joe″];
– Java initialization of String objects
String[] names = {″Bob″, ″Jake″, ″Joe″};
Heterogeneous Arrays

• A heterogeneous array is one in which the elements need not be of the same type

• Supported by Perl, Python, JavaScript, and Ruby

Array Initialization

• C-based languages
– int list [] = {1, 3, 5, 7}
– char *names [] = {″Mike″, ″Fred″, ″Mary Lou″};
• Ada
– List: array (1..5) of Integer := (1 => 17, 3 => 34, others => 0);
• Python
– List comprehensions
list = [x ** 2 for x in range(12) if x % 3 == 0]
puts [0, 9, 36, 81] in list
Arrays Operations

• APL provides the most powerful array processing operations for vectors and matrixes as well
as unary operators (for example, to reverse column elements)

• Ada allows array assignment but also catenation

• Python’s array assignments, but they are only reference changes. Python also supports array
catenation and element membership operations

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Ruby also provides array catenation

• Fortran provides elemental operations because they are between pairs of array elements

– For example, + operator between two arrays results in an array of the sums of the element pairs
of the two arraysEvaluation and Comparison to Arrays

• Records are used when collection of data values is heterogeneous

• Access to array elements is much slower than access to record fields, because subscripts
are dynamic (field names are static)

• Dynamic subscripts could be used with record field access, but it would disallow type
checking and it would be much slower

Rectangular and Jagged Arrays

• A rectangular array is a multi-dimensioned array in which all of the rows have the same
number of elements and all columns have the same number of elements

• A jagged matrix has rows with varying number of elements

– Possible when multi-dimensioned arrays actually appear as arrays of arrays• C, C++,


and Java support jagged arrays

• Fortran, Ada, and C# support rectangular arrays (C# also supports jagged arrays)

Slices

• A slice is some substructure of an array; nothing more than a referencing mechanism

• Slices are only useful in languages that have array operations Slice Examples

• Python

vector = [2, 4, 6, 8, 10, 12, 14, 16]

mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

– vector (3:6) is a three-element array

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– mat[0][0:2] is the first and second element of the first row of mat

• Ruby supports slices with the slice method

– list.slice(2, 2) returns the third and fourth elements of list

In Ruby

array1 = [1, 2, 3, 4, 5]
array2 = ["a", "b", "c", "d", "e"]
array3 = ["cat", "dog", "cow", "rat", "fox"]
array4 = [true, false, nil]
array5 = ["", "nil", "false", "true"]

# call `slice()` method and save returned sub-arrays


a = array1.slice(1) # 2nd element
b = array2.slice(2, 3) # from 3rd element, return 3
c = array3.slice(1, 1) # from 2nd element, return only 1
d = array4.slice(0, 5) # from 1st element, return all elements
e = array5.slice(2) # return 3rd element
In python :

Lst =[50,70,30,20,90,10,50]
print(Lst[1:5])
Output:
[70,30,20,90]
Implementation of Arrays

• Access function maps subscript expressions to an address in the array

• Access function for single-dimensioned arrays:

address(list[k]) = address (list[lower_bound])

+ ((k-lower_bound) * element_size)

Accessing Multi-Dimensioned Arrays

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Two common ways:

– Row major order (by rows) – used in most languages

– Column major order (by columns) – used in Fortran –

A compile-time descriptor for a Multidimensional array

Locating an Element in a Multidimensioned Array

• General format

– Location (a[I,j]) = address of a [row_lb,col_lb] + (((I - row_lb) * n) + (j - col_lb)) *


element_size

Compile-Time Descriptors

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

2.8 ASSOCIATIVE ARRAYS

• An associative array is an unordered collection of data elements that are indexed by an equal
number of values called keys

– User-defined keys must be stored

Design issues:

- What is the form of references to elements?

- Is the size static or dynamic?

• Built-in type in Perl, Python, Ruby, and Lua

– In Lua, they are supported by tables

Associative Arrays in Perl

• Names begin with %; literals are delimited by parentheses


%hi_temps = ("Mon" => 77, "Tue" => 79, "Wed" => 65, …);
• Subscripting is done using braces and keys
$hi_temps{"Wed"} = 83;
– Elements can be removed with delete
delete $hi_temps{"Tue"};

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

2.9 RECORD TYPES

A record is an aggregate of data elements in which the individual elements are identified by
names and accessed through offsets from the beginning of the structure.

• A record is a possibly heterogeneous aggregate of data elements in which the individual


elements are identified by names

There is frequently a need in programs to model a collection of data in which the


individual elements are not of the same type or size. For example, information about a
college student might include name, student number, grade point average, and so forth. A
data type for such a collection might use a character string for the name, an integer for the
student number, a floating point for the grade point average, and so forth. Records are
designed for this kind of need.

• Design issues:

– What is the syntactic form of references to the field?

–Are elliptical references allowed?

Definition of Records in COBOL

The fundamental difference between a record and an array is that record elements, or
fields, are not referenced by indices. Instead, the fields are named with identifiers, and references
to the fields are made using these identifiers.

The COBOL form of a record declaration, which is part of the data division of a COBOL
program, is illustrated in the following example:

01 EMPLOYEE-RECORD.

02 EMPLOYEE-NAME.

05 FIRST PICTURE IS X(20).

05 MIDDLE PICTURE IS X(10).

05 LAST PICTURE IS X(20).

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

02 HOURLY-RATE PICTURE IS 99V99.

The EMPLOYEE-RECORD record consists of the EMPLOYEE-NAME record and the


HOURLY-RATE field. The numerals 01, 02, and 05 that begin the lines of the record declaration
are level numbers, which indicate by their relative values the hierarchical structure of the record.
Any line that is followed by a line with a higher-level number is itself a record. The PICTURE
clauses show the formats of the field storage locations, with X(20) specifying 20 alphanumeric
characters and 99V99 specifying four decimal digits with the decimal point in the middle.

Definition of Records in Ada

Ada uses a different syntax for records; rather than using the level numbers of COBOL,
record structures are indicated in an orthogonal way by simply nesting record declarations inside
record declarations. In Ada, records cannot be anonymous—they must be named types. Consider
the following Ada declaration:

type Employee_Name_Type is record


First : String (1..20);
Middle : String (1..10);
Last : String (1..20);
end record;
type Employee_Record_Type is record
Employee_Name: Employee_Name_Type;
Hourly_Rate: Float;
end record;
Employee_Record: Employee_Record_Type;

In Java and C#, records can be defined as data classes, with nested records defined as
nested classes. Data members of such classes serve as the record fields. As stated previously,
Lua’s associative arrays can be conveniently used as records. For example, consider the
following declaration:

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

employee.name = "Freddie"

employee.hourlyRate = 13.20

These assignment statements create a table (record) named employee with two elements (fields)
named name and hourlyRate, both initialized.References to Records

Record field references:

1. COBOL

field_name OF record_name_1 OF ... OF record_name_n

2. Others (dot notation)

record_name_1.record_name_2. ...record_name_n.field_name

A fully qualified reference to a record field is one in which all intermediate record
names, from the largest enclosing record to the specific field, are named in the reference. Both
the COBOL and the Ada example field references above are fully qualified

Elliptical References.

A fully qualified reference to a record field is one in which all intermediate record names,
from the largest enclosing record to the specific field, are named in the reference. Both the
COBOL and the Ada example field references above are fully qualified. As an alternative to
fully qualified references,

COBOL allows elliptical references to record fields. In an elliptical reference, the field is
named, but any or all of the enclosing record names can be omitted, as long as the resulting
reference is unambiguous in the referencing environment. For example, FIRST, FIRST OF
EMPLOYEE-NAME, and FIRST OF EMPLOYEE-RECORD are elliptical references to the
employee’s first name in the COBOL record declared above. Although elliptical references are a
programmer convenience, they require a compiler to have elaborate data structures and
procedures in order to correctly identify the referenced field. They are also somewhat detrimental
to readability.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Operations on Records

• Assignment is very common if the types are identical

• Ada allows record comparison

• Ada records can be initialized with aggregate literals

• COBOL provides MOVE CORRESPONDING

– Copies a field of the source record to the corresponding field in the target record

Evaluation and Comparison to Arrays

• Records are used when collection of data values is heterogeneous

• Access to array elements is much slower than access to record fields, because subscripts are
dynamic (field names are static)

• Dynamic subscripts could be used with record field access, but it would disallow type checking
and it would be much slower

Implementation of Record Type

The fields of records are stored in adjacent memory locations. But because the sizes of
the fields are not necessarily the same, the access method used for arrays is not used for records.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Instead, the offset address, relative to the beginning of the record, is associated with each field.

Field accesses are all handled using these offsets.

Tuple Types

• A tuple is a data type that is similar to a record, except that the elements are not named

• Used in Python, ML, and F# to allow functions to return multiple values

– Python

• Closely related to its lists, but immutable

• Create with a tuple literal

myTuple = (3, 5.8, ′apple′)

Referenced with subscripts (begin at 1) concatenation with + and deleted with del

Tuple Types in Python

• Closely related to its lists, but immutable


• Create with a tuple literal
myTuple = (3, 5.8, ′apple′)
• Referenced with subscripts (begin at 1)
• conCatenation with + and deleted with del

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Tuple Types in ML

Given

val myTuple = (3, 5.8, ′apple′);

Access as follows:

#1(myTuple) is the first element

A new tuple type can be defined

type intReal = int * real;

Tuple Types in F#

let tup = (3, 5, 7)

let a, b, c = tup

This assigns a tuple to a tuple pattern (a, b, c)

List Types

• Lists in LISP and Scheme are delimited by parentheses and use no commas

(A B C D) and (A (B C) D)

• Data and code have the same form

As data, (A B C) is literally what it is

As code, (A B C) is the function A applied to the parameters B and C

• The interpreter needs to know which a list is, so if it is data, we quote it with an apostrophe

′(A B C) is data

List Operations in Scheme

• CAR returns the first element of its list parameter

(CAR ′(A B C)) returns A

• CDR returns the remainder of its list parameter after the first element has been removed

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

(CDR ′(A B C)) returns (B C)

• CONS puts its first parameter into its second parameter, a list, to make a new list

(CONS ′A (B C)) returns (A B C)

• LIST returns a new list of its parameters

(LIST ′A ′B ′(C D)) returns (A B (C D))

List Operations in ML

• Lists are written in brackets and the elements are separated by commas

• List elements must be of the same type

• The Scheme CONS function is a binary operator in ML, ::

3 :: [5, 7, 9] evaluates to [3, 5, 7, 9]

• The Scheme CAR and CDR functions are named hd and tl, respectively

Lists n F# and ML

• F# Lists

– Like those of ML, except elements are separated by semicolons and hd and tl are
methods of the List class

• Python Lists

– The list data type also serves as Python’s arrays

– Unlike Scheme, Common LISP, ML, and F#, Python’s lists are mutable

– Elements can be of any type

– Create a list with an assignment

myList = [3, 5.8, "grape"]

Lists in Python

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• List elements are referenced with subscripting, with indices beginning at zero

x = myList[1] Sets x to 5.8

• List elements can be deleted with del del myList[1]

• Python includes a powerful mechanism for creating arrays called list comprehensions. A list
comprehension is an idea derived from set notation. It first appeared in the functional
programming language Haskell.

The mechanics of a list comprehension is that a function is applied to each of the elements of a
given array and a new array is constructed from the results.

The syntax of a Python list comprehension is as follows: List Comprehensions – derived from set
notation

[x * x for x in range (6) if x % 3 == 0]

range (12) creates [0, 1, 2, 3, 4, 5, 6]

Constructed list: [0, 9, 36]

List Comprehensions - Example

• Haskell’s List Comprehensions

– The original

[n * n | n <- [1..10]]

• F#’s List Comprehensions

myArray = [|for i in 1 .. 5 -> [i * i) |]

• Both C# and Java supports lists through their generic heap-dynamic collection classes, List and
ArrayList, respectively

2.10 UNIONS TYPES

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• A union is a type whose variables are allowed to store different type values at different times
during execution

• Design issues

– Should type checking be required?

– Should unions be embedded in records?

Discriminated vs. Free Unions

• Fortran, C, and C++ provide union constructs in which there is no language support for type
checking; the union in these languages is called free union

consider the following C union:

union flexType {
int intEl;
float floatEl;
};
union flexType el1;
float x;
...
el1.intEl = 27;
x = el1.floatEl;
This last assignment is not type checked, because the system cannot determine the current
type of the current value of el1, so it assigns the bit string representation of 27 to the float
variable x, which of course is nonsense.

• Type checking of unions require that each union include a type indicator called a discriminant,
and a union with a discriminant is called a discriminated union

– Supported by Ada

Ada Union Types

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

type Shape is (Circle, Triangle, Rectangle);


type Colors is (Red, Green, Blue);
type Figure (Form: Shape) is record
Filled: Boolean;
Color: Colors;
case Form is
when Circle => Diameter: Float;
when Triangle =>
Leftside, Rightside: Integer;
Angle: Float;
when Rectangle => Side1, Side2: Integer;
end case;
end record;
The structure of this variant record is shown in Figure. The following two statements declare
variables of type Figure:
Figure_1 : Figure;
Figure_2 : Figure(Form => Triangle);
Figure_1 is declared to be an unconstrained variant record that has no initial value. Its
type can change by assignment of a whole record, including the discriminant, as in the following:
Figure_1 := (Filled => True,
Color => Blue,
Form => Rectangle,
Side_1 => 12,
Side_2 => 3);

Ada Union Type Illustrated

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

A discriminated union of three shape variables

Unions in F#

A union is declared in F# with a type statement using OR operators (|) to define the components.
For example, we could have the following:

type intReal =

| IntValue of int

| RealValue of float;;

In this example, intReal is the union type. IntValue and RealValue are constructors. Values of
type intReal can be created using the constructors as if they were a function, as in the following
examples:

let ir1 = IntValue 17;;

let ir2 = RealValue 3.4;;

Implementation of Unions

Unions are implemented by simply using the same address for every possible variant.
Sufficient storage for the largest variant is allocated. The tag of a discriminated union is stored
with the variant in a recordlike structure.
At compile time, the complete description of each variant must be stored. This can be
done by associating a case table with the tag entry in the descriptor.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

The case table has an entry for each variant, which points to a descriptor for that
particular variant. To illustrate this arrangement, consider the following
Ada example:
type Node (Tag: Boolean) is
record
case Tag is
when True => Count : Integer;
when False => Sum : Float;
end case;
end record;

The descriptor for this type could have the form shown in Figure
A compile-time descriptor for a discriminated union.

Evaluation of Unions

• Free unions are unsafe

– Do not allow type checking

• Java and C# do not support unions

– Reflective of growing concerns for safety in programming language

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Ada’s discriminated unions are safe.

2.11 POINTER AND REFERENCE TYPES

A pointer type is one in which the variables have a range of values that consists of
memory addresses and a special value, nil. The value nil is not a valid address and is used to
indicate that a pointer cannot currently be used to reference a memory cell.

Pointers are designed for two distinct kinds of uses. First, pointers provide some of the
power of indirect addressing, which is frequently used in assembly language programming.
Second, pointers provide a way to manage dynamic storage. A pointer can be used to access a
location in an area where storage is dynamically allocated called a heap• Provide the power of
indirect addressing

Variables that are dynamically allocated from the heap are called heapdynamic variables. They
often do not have identifiers associated with them and thus can be referenced only by pointer or
reference type variables. Variables without names are called anonymous variables

Pointers, unlike arrays and records, are not structured types, although they are defined
using a type operator (* in C and C++ and access in Ada). Furthermore, they are also different
from scalar variables because they are used to reference some other variable, rather than being
used to store data.

These two categories of variables are called reference types and value types, respectively.

• Provide a way to manage dynamic memory

• A pointer can be used to access a location in the area where storage is dynamically created
(usually called a heap)

Design Issues of Pointers

• What are the scope of and lifetime of a pointer variable?

• What is the lifetime of a heap-dynamic variable?

• Are pointers restricted as to the type of value to which they can point?

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Are pointers used for dynamic storage management, indirect addressing, or both?

• Should the language support pointer types, reference types, or both?

Pointer Operations

• Two fundamental operations: assignment and dereferencing


• Assignment is used to set a pointer variable’s value to some useful address
• Dereferencing yields the value stored at the location represented by the pointer’s value
– Dereferencing can be explicit or implicit
– C++ uses an explicit operation via *
j = *ptr
sets j to the value located at ptr
Pointer Assignment Illustrated

Problems with Pointers

• Dangling pointers
– A pointer points to a heap-dynamic variable that has been deallocated
• Lost heap-dynamic variable
– An allocated heap-dynamic variable that is no longer accessible to the user program
(often called garbage)

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Pointer p1 is set to point to a newly created heap-dynamic variable


• Pointer p1 is later set to point to another newly created heap dynamic variable
• The process of losing heap-dynamic variables is called memory leakage
Pointers in Ada

• Some dangling pointers are disallowed because dynamic objects can be automatically
deallocated at the end of pointer's type scope

• The lost heap-dynamic variable problem is not eliminated by Ada (possible with
UNCHECKED_DEALLOCATION)

Pointers in C and C++

• Extremely flexible but must be used with care

• Pointers can point at any variable regardless of when or where it was allocated

• Used for dynamic storage management and addressing

• Pointer arithmetic is possible

• Explicit dereferencing and address-of operators

• Domain type need not be fixed (void *)

void * can point to any type and can be type checked (cannot be de-referenced)

Pointer Arithmetic in C and C++

float stuff[100];
float *p;
p = stuff;
*(p+5) is equivalent to stuff[5] and p[5]
*(p+i) is equivalent to stuff[i] and p[i]
Reference Types

A reference type variable is similar to a pointer, with one important and fundamental
difference: A pointer refers to an address in memory, while a reference refers to an object or a

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

value in memory. As a result, although it is natural to perform arithmetic on addresses, it is not


sensible to do arithmetic on references

• C++ includes a special kind of pointer type called a reference type that is used primarily for
formal parameters

– Advantages of both pass-by-reference and pass-by value

• Java extends C++’s reference variables and allows them to replace pointers entirely

– References are references to objects, rather than being addresses

• C# includes both the references of Java and the pointers of C++

Evaluation of Pointers

• Dangling pointers and dangling objects are problems as is heap management

• Pointers are like goto's--they widen the range of cells that can be accessed by a variable

• Pointers or references are necessary for dynamic data structures--so we can't design a
language without them

Representations of Pointers

• Large computers use single values

• Intel microprocessors use segment and Offset

Dangling Pointer Problem

• There are several proposed solutions for dangling pointers:


– Tombstone
– Lock and Key

Tombstone

• Tombstone is an extra heap cell that is a pointer to the heap-dynamic variable


• The actual pointer variable points only at tombstones
• When heap-dynamic variable de-allocated, tombstone remains but set to nil

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Costly in time and space


Locks-and-keys

• Locks-and-keys use pointer values that are represented as (key, address) pairs

• Heap-dynamic variables are represented as variable plus cell for integer lock value

• When heap-dynamic variable allocated, lock value is created and placed in lock cell and key
cell of pointer

Heap Management

• A very complex run-time process


• Single-size cells vs. variable-size cells
• Two approaches to reclaim garbage
– Reference counters (eager approach): reclamation is gradual
– Mark-sweep (lazy approach): reclamation occurs when the list of variable space
becomes empty
Reference Counter

• Reference counters: maintain a counter in every cell that store the number of pointers currently
pointing at the cell
– Disadvantages: space required, execution time required, complications for cells
connected circularly
– Advantage: it is intrinsically incremental, so significant delays in the application
execution are avoided
Mark-Sweep

The run-time system allocates storage cells as requested and disconnects pointers from
cells as necessary; mark-sweep then begins

– Every heap cell has an extra bit used by collection algorithm

– All cells initially set to garbage

– All pointers traced into heap, and reachable cells marked as not garbage

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– All garbage cells returned to list of available cells

Disadvantages of Mark-Sweep

• In its original form, it was done too infrequently.

• When done, it caused significant delays in application execution.

• Contemporary mark-sweep algorithms avoid this by doing it more often—called incremental


mark-sweep

Marking Algorithm

Variable-Size Cells

• All the difficulties of single-size cells plus more

• Required by most programming languages

• If mark-sweep is used, additional problems occur

• The initial setting of the indicators of all cells in the heap is difficult

• The marking process in nontrivial

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Maintaining the list of available space is another source of overhead

Type Checking

• Type checking is the activity of ensuring that the operands of an operator are of compatible
types

• A compatible type is one that is either legal for the operator, or is allowed under language rules
to be implicitly converted, by compiler- generated code, to a legal type

• Generalize the concept of operands and operators to include subprograms and assignments

This automatic type conversion is called a coercion.

For example, if an int variable and a float variable are added in Java, the value of the int
variable is coerced to float and a floating-point add is done.

• A type error is the application of an operator to an operand of an inappropriate type. For


example, in the original version of C, if an int value was passed to a function that expected a
float value, a type error would occur.

• If all type bindings are static, nearly all type checking can be static

• If type bindings are dynamic, type checking must be dynamic. Dynamic type binding requires
type checking at run time, which is called dynamic type checking.

Some languages, such as JavaScript and PHP, because of their dynamic type binding,
allow only dynamic type checking. It is better to detect errors at compile time than at run time

• A programming language is strongly typed if type errors are always detected

• Advantage of strong typing: allows the detection of the misuses of variables that result in type
errors

Strong Typing – Language Examples

A programming language is strongly typed if type errors are always detected. This
requires that the types of all operands can be determined, either at compile time or at run time

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

 C and C++ are not strongly typed languages because both include union types, which are
not type checked.ML is strongly typed, even though the types of some function
parameters may not be known at compile time. F# is strongly typed.
 Java and C#, although they are based on C++, are strongly typed in the same sense as
Ada

Name Type Equivalence

• Name type equivalence means the two variables have equivalent types if they are in either the
same declaration or in declarations that use the same type name

• Easy to implement but highly restrictive:

• Subranges of integer types are not equivalent with integer types

• Formal parameters must be the same type as their corresponding actual parametersThere are
two approaches to defining type equivalence: name type equivalence and structure type
equivalence. Name type equivalence means that two variables have equivalent types if they are
defined either in the same declaration or in declarations that use the same type name. Structure
type equivalence means that two variables have equivalent types if their types have identical
structures. There are some variations of these two approaches, and many languages use
combinations of them.

Structure Type Equivalence

• Structure type equivalence means that two variables have equivalent types if their types have
identical structures

• More flexible, but harder to implement.

Another difficulty with structure type equivalence is that it disallows differentiating


between types with the same structure. For example, consider the following Ada-like
declarations:

type Celsius = Float;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Fahrenheit = Float;

The types of variables of these two types are considered equivalent under structure type
equivalence, allowing them to be mixed in expressions, which is surely undesirable in this case,
considering the difference indicated by the type’s names.

A derived type is a new type that is based on some previously defined type with which it is not
equivalent, although it may have identical structure. Derived types inherit all the properties of
their parent types.Consider the following example:

type Celsius is new Float;

type Fahrenheit is new Float;

The types of variables of these two derived types are not equivalent, although their structures are
identical

• Consider the problem of two structured types:

– Are two enumeration types equivalent if their components are spelled differently?

– With structural type equivalence, you cannot differentiate between types of the same
structure (e.g. different units of speed, both float)

An Ada subtype is a possibly range-constrained version of an existing type. A subtype is type


equivalent with its parent type. For example, consider the following declaration:

subtype Small_type is Integer range 0..99;

The type Small_type is equivalent to the type Integer.

Note that Ada’s derived types are very different from Ada’s subrange types. For example,
consider the following type declarations:

type Derived_Small_Int is new Integer range 1..100;

subtype Subrange_Small_Int is Integer range 1..100;

Variables of both types, Derived_Small_Int and Subrange_Small_Int, have the same range of
legal values and both inherit the operations of Integer.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

For variables of an Ada unconstrained array type, structure type equivalence is used. For
example, consider the following type declaration and two object declarations:

type Vector is array (Integer range <>) of Integer;

Vector_1: Vector (1..10);

Vector_2: Vector (11..20);

The types of these two objects are equivalent, even though they have different names and
different subscript ranges, because for objects of unconstrained array types, structure type
equivalence rather than name type equivalence is used

Theory and Data Types

• Type theory is a broad area of study in mathematics, logic, computer science, and philosophy

• In computer science there are two branches of type theory: practical and abstract. The practical
branch is concerned with data types in commercial programming languages; the abstract branch
primarily focuses on typed lambda calculus, an area of extensive research by theoretical
computer scientists over the past half century

• A type system is a set of types and the rules that govern their use in programs

• Formal model of a type system is a set of types and a collection of functions that define the
type rules

– Either an attribute grammar or a type map could be used for the functions

– Finite mappings – model arrays and functions

– Cartesian products – model tuples and records

– Set unions – model union types

– Subsets – model subtypes

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

2.12 EXPRESSIONS AND ASSIGNMENT STATEMENTS

• Expressions are the fundamental means of specifying computations in a programming


language

• To understand expression evaluation, need to be familiar with the orders of operator and
operand evaluation

• Essence of imperative languages is dominant role of assignment statements

Arithmetic Expressions

• Arithmetic evaluation was one of the motivations for the development of the first
programming languages

• Arithmetic expressions consist of operators, operands, parentheses, and function calls

In most programming languages, binary operators are infix, which means they appear
between their operands. One exception is Perl, which has some operators that are prefix,
which means they precede their operands.

The purpose of an arithmetic expression is to specify an arithmetic computation. An


implementation of such a computation must cause two actions:

fetching the operands, usually from memory, and executing arithmetic operations on those
operands

• Design issues for arithmetic expressions

• operator precedence rules

• operator associativity rules

• order of operand evaluation

• operand evaluation side effects

• operator overloading

• mode mixing expressions

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Arithmetic Expressions: Operators

 A unary operator has one operand


 A binary operator has two operands
 A ternary operator has three operands

Arithmetic Expressions: Operator Precedence Rules

• The operator precedence rules for expression evaluation define the order in which
“adjacent” operators of different precedence levels are evaluated

• Typical precedence levels

– parentheses

– unary operators

– ** (if the language supports it)

– *, /

– +, -

Arithmetic Expressions: Operator Associativity Rule

The operator associativity rules for expression evaluation define the order in which
adjacent operators with the same precedence level are evaluated

Typical associativity rules

– Left to right, except **, which is right to left

– Sometimes unary operators associate right to left (e.g., in FORTRAN)

APL is different; all operators have equal precedence and all operators associate right to left

Precedence and associativity rules can be overriden with parentheses

Arithmetic Expressions: Conditional Expressions

• Conditional Expressions

– C-based languages (e.g., C, C++)

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– An example:

average = (count == 0)? 0 : sum / count

– Evaluates as if written like

if (count == 0) average = 0

else average = sum /count

Operand Evaluation Order

• Operand evaluation order

1. Variables: fetch the value from memory

2. Constants: sometimes a fetch from memory; sometimes the constant is in the


machine language instruction

3. Parenthesized expressions: evaluate all operands and operators first

Potentials for Side Effects

Functional side effects: when a function changes a two-way parameter or a non-local variable

Problem with functional side effects:

- When a function referenced in an expression alters another operand of the expression;


e.g., for a parameter change:

a = 10;

/* assume that fun changes its parameter */

b = a + fun(a);

In all of the common imperative languages, the unary minus operator can appear in an expression
either at the beginning or anywhere inside the expression, as long as it is parenthesized to
prevent it from being next to another operator. For example,

a + (- b) * c

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

is legal, but

a+-b*c

usually is not.

Next, consider the following expressions:

-a/b
-a*b
- a ** b
In the first two cases, the relative precedence of the unary minus operator and the binary operator
is irrelevant—the order of evaluation of the two operators has no effect on the value of the
expression

Of the common programming languages, only Fortran, Ruby, Visual Basic, and Ada have
the exponentiation operator. In all four, exponentiation has higher precedence than unary minus,
so
- A ** B
is equivalent to
- (A ** B)
The precedences of the arithmetic operators of Ruby and the C-based languages are as follows:

Associativity:

When an expression contains two adjacent 2 occurrences of operators with the same level
of precedence, the question of which operator is evaluated first is answered by the associativity
rules of the language. An operator can have either left or right associativity, meaning that when

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

there are two adjacent operators with the same precedence, the left operator is evaluated first or
the right operator is evaluated first, respectively

Associativity in common languages is left to right, except that the exponentiation


operator (when provided) sometimes associates right to left. In the

Java expression
a-b+c
the left operator is evaluated first.
Exponentiation in Fortran and Ruby is right associative, so in the expression
A ** B ** C
the right operator is evaluated first.
In Ada, exponentiation is non associative, which means that the expression
A ** B ** C
is illegal. Such an expression must be parenthesized to show the desired order, as in either
(A ** B) ** C
or
A ** (B ** C)
Parentheses:

A parenthesized part of an expression has precedence over its adjacent unparenthesized


parts. For example, although multiplication has precedence over addition, in the expression

(A + B) * C

the addition will be evaluated first

Expressions in LISP:

As is the case with Ruby, all arithmetic and logic operations in LISP are performed by
subprograms. But in LISP, the subprograms must be explicitly called. For example, to specify
the C expression a + b * c in LISP, one must write the following expression:3

(+ a (* b c))

In this expression, + and * are the names of functions.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Conditional Expressions

if-then-else statements can be used to perform a conditional expression assignment. For example,
consider

if (count == 0)
average = 0;
else
average = sum / count;
In the C-based languages, this code can be specified more conveniently in an assignment
statement using a conditional expression, which has the form

expression_1? expression_2 : expression_3

where expression_1 is interpreted as a Boolean expression. If expression_1 evaluates to true, the


value of the whole expression is the value of expression_2;

otherwise, it is the value of expression_3. For example, the effect of the example if-then-
else can be achieved with the following assignment statement, using a conditional expression:

average = (count == 0) ? 0 : sum / count;

In effect, the question mark denotes the beginning of the then clause, and the colon marks the
beginning of the else clause. Both clauses are mandatory

Functional Side Effects

A side effect of a function, naturally called a functional side effect, occurs when the
function changes either one of its parameters or a global variable. (A global variable is declared
outside the function but is accessible in the function.)

The following C program illustrates the same problem when a function changes a global variable
that appears in an expression:

int a = 5;
int fun1() {
a = 17;
return 3;
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

} /* end of fun1 */
void main() {
a = a + fun1();
} /* end of main */
The value computed for a in main depends on the order of evaluation of the operands in
the expression a + fun1(). The value of a will be either 8 (if a is evaluated first) or 20 (if the
function call is evaluated first).

2.13 OVERLOADED OPERATORS

• Use of an operator for more than one purpose is called operator overloading

• Some are common (e.g., + for int and float)

• Some are potential trouble (e.g., * in C and C++)

– Loss of compiler error detection (omission of an operand should be a detectable


error)

– Some loss of readability

– Can be avoided by introduction of new symbols (e.g., Pascal’s div for integer
division)

• C++ and Ada allow user-defined overloaded operators

• Potential problems:

– Users can define nonsense operations

– Readability may suffer, even when the operators make sense

2.14 TYPE CONVERSIONS

• A narrowing conversion is one that converts an object to a type that cannot include all of
the values of the original type

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• e.g., For example, converting a double to a float in Java is a narrowing conversion,


because the range of double is much larger than that of float.

• A widening conversion is one in which an object is converted to a type that can include at
least approximations to all of the values of the original type.

For example, converting an int to a float in Java is a widening conversion.

Type conversions can be either explicit or implicit.

Type Conversions: Mixed Mode

• A mixed-mode expression is one that has operands of different types

• Automatic type conversion is called as coercion. A coercion is an implicit type


conversion

• Disadvantage of coercions:

– They decrease in the type error detection ability of the compiler

As a simple illustration of the problem, consider the following Java code:

int a;
float b, c, d;
...
d = b * a;
Assume that the second operand of the multiplication operator was supposed to be c, but
because of a keying error it was typed as a. Because mixed-mode expressions are legal in Java,
the compiler would not detect this as an error. It would simply insert code to coerce the value of
the int operand, a, to float.

If mixed-mode expressions were not legal in Java, this keying error would have been detected by
the compiler as a type error.

• In most languages, all numeric types are coerced in expressions, using widening
conversions

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In Ada, there are virtually no coercions in expressions If the Java code example were written
in Ada, as in

A : Integer;
B, C, D : Float;
...
C := B * A;
then the Ada compiler would find the expression erroneous, because Float and Integer
operands cannot be mixed for the * operator

ML and F# do not coerce operands in expressions. Any necessary conversions must be


explicit. This results in the same high level of reliability in expressions that is provided by Ada.

The C-based languages have integer types that are smaller than the int type. In Java, they
are byte and short. Operands of all of these types are coerced to int whenever virtually any
operator is applied to them. So, while data can be stored in variables of these types, it cannot be
manipulated before conversion to a larger type. For example, consider the following Java code:

byte a, b, c;
...
a = b + c;
The values of b and c are coerced to int and an int addition is performed. Then, the sum is
converted to byte and put in a

Explicit Type Conversions

• Most languages provide some capability for doing explicit conversions, both widening
and narrowing
• Explicit type conversion is Called casting in C-based language

• Examples

– C: (int) angle

– Ada: Float (sum)

Note that Ada’s syntax is similar to function calls

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Type Conversions: Errors in Expressions

• Causes

– Inherent limitations of arithmetic, e.g., division by zero

– Limitations of computer arithmetic, e.g. overflow

• Often ignored by the run-time system

2.15 RELATIONAL AND BOOLEAN EXPRESSIONS

• Relational Expressions

– A relational operator is an operator that compares the values of its two operands.
A relational expression has two operands and one relational operator. The value
of a relational expression is Boolean, except when Boolean is not a type included
in the language. Operator symbols used vary somewhat among languages (!=, /=,
.NE., <>, #)

The syntax of the relational operators for equality and inequality differs among some
programming languages. For example, for inequality,

the C-based languages use !=,

Ada uses /=,

Lua uses ~=,

Fortran 95+ uses .NE. or <>, and

ML and F# use <>.

JavaScript and PHP have two additional relational operators,

= = = and !==. These are similar to their relatives, == and !=, but prevent their operands from
being coerced. For example, the expression

"7" == 7

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

is true in JavaScript, because when a string and a number are the operands of a relational
operator, the string is coerced to a number. However,

"7" === 7

is false, because no coercion is done on the operands of this operator.

Ruby uses == for the equality relational operator that uses coercions, and eql? for equality with
no coercions. Ruby uses === only in the when clause of its case statement,

The relational operators always have lower precedence than the arithmetic operators, so that in
expressions such as

a+1>2*b

the arithmetic expressions are evaluated first.

Boolean Expressions

Boolean expressions consist of Boolean variables, Boolean constants, relational


expressions, and Boolean operators. The operators usually include those for the AND, OR, and
NOT operations, and sometimes for exclusive OR and equivalence. Boolean operators usually
take only Boolean operands (Boolean variables, Boolean literals, or relational expressions) and
produce Boolean values

FORTRAN 77 FORTRAN 90 C Ada

.AND. and && and

.OR. or || or

.NOT. not ! not

xor

No Boolean Type in C

• C has no Boolean type--it uses int type with 0 for false and nonzero for true

• One odd characteristic of C’s expressions: a < b < c is a legal expression, but the
result is not what you might expect:

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– Left operator is evaluated, producing 0 or 1

– The evaluation result is then compared with the third operand (i.e., c)

Relational and Boolean Expressions: Operator Precedence

• Precedence of C-based operators

postfix ++, --
unary +, -, prefix ++, --, !
*,/,%
binary +, -
<, >, <=, >=
=, !=
&&
||

Short Circuit Evaluation

• An expression in which the result is determined without evaluating all of the operands
and/or operators

• Example: (13*a) * (b/13–1)

If a is zero, there is no need to evaluate (b/13-1)

• Problem with non-short-circuit evaluation

index = 1;

while (index < length) && (LIST[index] != value)

index++;

– When index=length, LIST [index] will cause an indexing problem (assuming


LIST has length -1 elements)

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• C, C++, and Java: use short-circuit evaluation for the usual Boolean operators (&& and
||), but also provide bitwise Boolean operators that are not short circuit (& and |)

• Ada: programmer can specify either (short-circuit is specified with and then and or else)

• Short-circuit evaluation exposes the potential problem of side effects in expressions


e.g. (a > b) || (b++ / 3)

2.16 ASSIGNMENT STATEMENTS

• The general syntax

<target_var> <assign_operator> <expression>

• The assignment operator

= FORTRAN, BASIC, PL/I, C, C++, Java

:= ALGOLs, Pascal, Ada

= can be bad when it is overloaded for the relational operator for equality

Assignment Statements: Conditional Targets

• Conditional targets (C, C++, and Java)


(flag)? total : subtotal = 0

Which is equivalent to

if (flag)
total = 0
else
subtotal = 0
Compound Assignment Operators

• A shorthand method of specifying a commonly needed form of assignment

• Introduced in ALGOL; adopted by C

• Example

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

a=a+b

is written as

a += b

For example,

sum += value;
is equivalent to
sum = sum + value;

Unary Assignment Operators

• Unary assignment operators in C-based languages combine increment and decrement


operations with assignment

• Examples

count++ (count incremented)

--count (count decremented)

In the assignment statement

sum = ++ count;
the value of count is incremented by 1 and then assigned to sum. This operation could also be
stated as
count = count + 1;
sum = count;
If the same operator is used as a postfix operator, as in

sum = count ++;

the assignment of the value of count to sum occurs first; then count is incremented. The effect is
the same as that of the two statements

sum = count;
count = count + 1;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

An example of the use of the unary increment operator to form a complete assignment statement
is

count ++;

which simply increments count. It does not look like an assignment, but it certainly is one. It is
equivalent to the statement

count = count + 1;

When two unary operators apply to the same operand, the association is right to left. For
example, in

- count ++

count is first incremented and then negated. So, it is equivalent to

- (count ++)

Assignment as an Expression

• In C, C++, and Java, the assignment statement produces a result and can be used as
operands

• An example:

while ((ch = getchar())!= EOF){…}

ch = getchar() is carried out; the result (assigned to ch) is used as a conditional value for
the while statement

Note that the treatment of the assignment operator as any other binary operator allows the effect
of multiple-target assignments, such as

sum = count = 0;

in which count is first assigned the zero, and then count’s value is assigned to sum. This form of
multiple-target assignments is also legal in Python

Multiple Assignments

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Several recent programming languages, including Perl, Ruby, and Lua, provide multiple-target,
multiple-source assignment statements. For example, in Perl one can write

($first, $second, $third) = (20, 40, 60);

The semantics is that 20 is assigned to $first, 40 is assigned to $second, and 60 is


assigned to $third. If the values of two variables must be interchanged, this can be done with a
single assignment, as with

($first, $second) = ($second, $first);

This correctly interchanges the values of $first and $second, without the use of a
temporary variable (at least one created and managed by the programmer).

The syntax of the simplest form of Ruby’s multiple assignment is similar to that of Perl,
except the left and right sides are not parenthesized

2.17 Mixed-Mode Assignment

Assignment statements can also be mixed-mode, for example

int a, b;
float c;
c = a / b;
• In Pascal, integer variables can be assigned to real variables, but real variables cannot be
assigned to integers

• In Java, only widening assignment coercions are done

• In Ada, there is no assignment coercion

2.18 CONTROL STATEMENTS

Control Structures are just a way to specify flow of control in programs. Any algorithm or
program can be clearer and understood if they use self-contained modules called as logic or
control structures. It basically analyzes and chooses in which direction a program flows based

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

on certain parameters or conditions. There are three basic types of logic, or flow of control,
known as:
1. Sequence logic, or sequential flow
2. Selection logic, or conditional flow
3. Iteration logic, or repetitive flow
A control structure is a control statement and the collection of statements whose execution it
controls.

Control Statements: Evolution

• FORTRAN I control statements were based directly on IBM 704 hardware

• Much research and argument in the 1960s about the issue

– One important result: It was proven that all algorithms represented by flowcharts can be
coded with only two-way selection and pretest logical loops

• A control structure is a control statement and the statements whose execution it controls

Selection Statements

• A selection statement provides the means of choosing between two or more paths of execution

• Two general categories:

– Two-way selectors

– Multiple-way selectors

Two-Way Selection Statements

• General form:

if control_expression
then clause
else clause
• Design Issues:

– What is the form and type of the control expression?

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

– How are the then and else clauses specified?

– How should the meaning of nested selectors be specified?

The Control Expression

• If the then reserved word or some other syntactic marker is not used to introduce the then
clause, the control expression is placed in parentheses

• In C89, C99, Python, and C++, the control expression can be arithmetic

• In most other languages, the control expression must be Boolean

Clause Form

• In many contemporary languages, the then and else clauses can be single statements or
compound statements

• In Perl, all clauses must be delimited by braces (they must be compound)

• In Fortran 95, Ada, Python, and Ruby, clauses are statement sequences

• Python uses indentation to define clauses

if x > y :
x=y
print " x was greater than y

Nesting Selectors

Java example

if (sum == 0)
if (count == 0)
result = 0;
else result = 1;
• Which if gets the else?

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Java's static semantics rule: else matches with the nearest previous if

• To force an alternative semantics, compound statements may be used:

if (sum == 0) {
if (count == 0)
result = 0;
}
else result = 1;
• The above solution is used in C, C++, and C#

• Statement sequences as clauses:

Ruby

if sum == 0 then
if count == 0 then
result = 0
else
result = 1
end
end
Python

if sum == 0 :
if count == 0 :
result = 0
else :
result = 1
Selector Expressions

In the functional languages ML, F#, and LISP, the selector is not a statement; it is an expression
that results in a value. Therefore, it can appear anywhere any other expression can appear.
Consider the following example selector written in F#:

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

let y =

if x > 0 then x

else 2 * x;;

This creates the name y and sets it to either x or 2 * x, depending on whether x is greater than
zero– If the if expression returns a value, there must be an else clause (the expression could
produce output, rather than a value)

Multiple-Way Selection Statements

The multiple-selection statement allows the selection of one of any number of statements or
statement groups

• Design Issues:

1. What is the form and type of the control expression?

2. How are the selectable segments specified?

3. Is execution flow through the structure restricted to include just a single selectable segment?

4. How are case values specified?

5. What is done about unrepresented expression values?Multiple-Way Selection: Examples

• C, C++, Java, and JavaScript its general form is

switch (expression) {
case const_expr1: stmt1;

case const_exprn: stmtn;
[default: stmtn+1]
}

Multiple-Way Selection: Examples

• Design choices for C’s switch statement

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

1. Control expression can be only an integer type

2. Selectable segments can be statement sequences, blocks, or compound statements

3. Any number of segments can be executed in one execution of the construct (there is no
implicit branch at the end of selectable segments)

4. default clause is for unrepresented values (if there is no default, the whole statement does
nothing)

Consider the following example:

switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
case 2:
case 4: even += 1;
sumeven += index;
default: printf("Error in switch, index = %d\n", index);
}
This code prints the error message on every execution. Likewise, the code for the 2 and 4
constants is executed every time the code at the 1 or 3 constants is executed. To separate these
segments logically, an explicit branch must be included. The break statement, which is actually a
restricted goto, is normally used for exiting switch statements.

The following switch statement uses break to restrict each execution to a single selectable
segment:

switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
break;
case 2:

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

case 4: even += 1;
sumeven += index;
break;
default: printf("Error in switch, index = %d\n", index);
}
Occasionally, it is convenient to allow control to flow from one selectable code segment
to another. For example, in the example above, the segments for the case values 1 and 2 are
empty, allowing control to flow to the segments for 3 and 4, respectively

C#

– Differs from C in that it has a static semantics rule that disallows the implicit execution of
more than one segment

–For example,

switch (value) {
case -1:
Negatives++;
break;
case 0:
Zeros++;
goto case 1;
case 1:
Positives++;
default:
Console.WriteLine("Error in switch \n");
}

Note that Console.WriteLine is the method for displaying strings in C# Each selectable segment
must end with an unconditional branch (goto or break)

– Also, in C# the control expression and the case constants can be strings

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Ruby has two forms of multiple-selection constructs, both of which are called case expressions
and both of which yield the value of the last expression evaluated. The only version of Ruby’s
case expressions that is described here is semantically similar to a list of nested if statements:

case
when Boolean_expression then expression
...
when Boolean_expression then expression
[else expression]
end
The semantics of this case expression is that the Boolean expressions are evaluated one at a time,
top to bottom. The value of the case expression is the value of the first then expression whose
Boolean expression is true. The else represents true in this statement, and the else clause is
optional. For example,

leap = case
when year % 400 == 0 then true
when year % 100 == 0 then false
else year % 4 == 0
end
This case expression evaluates to true if year is a leap year

Implementing Multiple Selectors

• Approaches:

– Multiple conditional branches

– Store case values in a table and use a linear search of the table

– When there are more than ten cases, a hash table of case values can be used

– If the number of cases is small and more than half of the whole range of case values are
represented, an array whose indices are the case values and whose values are the case labels can
be used

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

general form of the C switch statement, with breaks:

switch (expression) {
case constant_expression1: statement1;
break;
...
case constantn: statementn;
break;
[default: statementn+1]
}
One simple translation of this statement follows: Code to evaluate expression into t
goto branches
label1: code for statement1
goto out
...
labeln: code for statementn
goto out
default: code for statementn+1
goto out
branches: if t = constant_expression1 goto label1
...
if t = constant_expressionn goto labeln
goto default
out:
The code for the selectable segments precedes the branches so that the targets of the branches are
all known when the branches are generated

Multiple-Way Selection Using if

• Multiple Selectors can appear as direct extensions to two-way selectors, using elseif clauses,
for example in Python:

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

if count < 10 :
bag1 = True
elif count < 100 :
bag2 = True
elif count < 1000 :
bag3 = True

• The Python example can be written as a Ruby case

case
when count < 10 then bag1 = true
when count < 100 then bag2 = true
when count < 1000 then bag3 = true
end

Scheme’s Multiple Selector

The Scheme multiple selector, which is based on mathematical conditional expressions, is a


special form function named COND. COND is a slightly generalized version of the mathematical
conditional expression; it allows more than one predicate to be true at the same time.

The general form of COND is

(COND
(predicate1 expression1)
(predicate2 expression2)

(predicaten expressionn)
[(ELSE expressionn+1)]
)
• The else clause is optional; else is a synonym for true

• Each predicate-expression pair is a parameter

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Semantics: The value of the evaluation of cond is the value of the expression associated with
the first predicate expression that is true

Consider the following example call to COND:

(COND
((> x y) "x is greater than y")
((< x y) "y is greater than x")
(ELSE "x and y are equal")
)
Note that string literals evaluate to themselves, so that when this call to COND is evaluated, it
produces a string result

ITERATIVE STATEMENTS

• The repeated execution of a statement or compound statement is accomplished either by


iteration or recursion. An iterative statement is one that causes a statement or collection of
statements to be executed zero, one, or more times. An iterative statement is often called a loop.

• General design issues for iteration control statements:

1. How is iteration controlled?

2. Where is the control mechanism in the loop?

Counter-Controlled Loops

A counting iterative control statement has a variable, called the loop variable, in which the count
value is maintained. It also includes some means of specifying the initial and terminal values of
the loop variable, and the difference between sequential loop variable values, often called the
stepsize. The initial, terminal, and stepsize specifications of a loop are called the loop
parameters.

• Design Issues:

1. What are the type and scope of the loop variable?

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

2. Should it be legal for the loop variable or loop parameters to be changed in the loop body, and
if so, does the change affect loop control?

3. Should the loop parameters be evaluated only once, or once for every iteration?

Counter-Controlled Loops: Examples

The Ada for Statement

The Ada for statement has the following form:

for variable in [reverse] discrete_range loop

...

end loop;

A discrete range is a subrange of an integer or enumeration type, such as 1..10 or


Monday….Friday. The reverse reserved word, when present, indicates that the values of the
discrete range are assigned to the loop variable in reverse order.

The most interesting new feature of the Ada for statement is the scope of the loop variable,
which is the range of the loop. The variable is implicitly declared at the for statement and
implicitly undeclared after loop termination.

For example, in

Count : Float := 1.35;


for Count in 1..10 loop
Sum := Sum + Count;
end loop;
the Float variable Count is unaffected by the for loop. Upon loop termination, the
variable Count is still Float type with the value of 1.35. Also, the Float-type variable Count is
hidden from the code in the body of the loop, being masked by the loop counter Count, which is
implicitly declared to be the type of the discrete range, Integer

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Counter-Controlled Loops: Examples

C-based languages

for (expression_1; expression_2; expression_3)

loop body

The loop body can be a single statement, a compound statement, or a null statement.

Because assignment statements in C produce results and thus can be considered


expressions, the expressions in a for statement are often assignment statements. The first
expression is for initialization and is evaluated only once, when the for statement execution
begins. The second expression is the loop control and is evaluated before each execution of the
loop body. As is usual in C, a zero value means false and all nonzero values mean true.
Therefore, if the value of the second expression is zero, the for is terminated; otherwise, the loop
body statements are executed

 C expressions can be used as statements,expression evaluations are shown as statements.

expression_1
loop:
if expression_2 = 0 goto out
[loop body]
expression_3
goto loop
out: . . .
Following is an example of a skeletal C for statement:
for (count = 1; count <= 10; count++)
...
}
Example:

Consider the following for statement:


for (count1 = 0, count2 = 1.0;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

count1 <= 10 && count2 <= 100.0;


sum = ++count1 + count2, count2 *= 2.5);
The operational semantics description of this is
count1 = 0
count2 = 1.0
loop:
if count1 > 10 goto out
if count2 > 100.0 goto out
count1 = count1 + 1
sum = count1 + count2
count2 = count2 * 2.5
goto loop
out: …
The first expression can include variable definitions. For example,

for (int count = 0; count < len; count++) { . . . }

The scope of a variable defined in the for statement is from its definition to the end of the loop
body.

• Java and C#

– Differs from C++ in that the control expression must be Boolean

The for Statement of Python

The general form of Python’s for is

for loop_variable in object:


- loop body
[else:
- else clause]
The loop variable is assigned the value in the object, which is often a range, one for each
execution of the loop body. The else clause, when present, is executed if the loop terminates
normally.
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Consider the following example:

for count in [2, 4, 6]:

print count

produces
2
4
6
For most simple counting loops in Python, the range function is used. range takes one, two, or
three parameters. The following examples demonstrate the actions of range:

range(5) returns [0, 1, 2, 3, 4]

range(2, 7) returns [2, 3, 4, 5, 6]

range(0, 8, 2) returns [0, 2, 4, 6]

Note that range never returns the highest value in a given parameter range.

Counter-Controlled Loops: Example in F#

• Because counters require variables, and functional languages do not have variables, counter-
controlled loops must be simulated with recursive functions

let rec forLoop loopBody reps =


if reps <= 0 then
()
else
loopBody()
forLoop loopBody, (reps – 1)
• This defines the recursive function forLoop with the parameters loopBody (a function that
defines the loop’s body) and the number of repetitions

• () means do nothing and return nothing

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

Logically-Controlled Loops

• Repetition control is based on a Boolean expression

• Design issues:

– Pretest or posttest?

– Should the logically controlled loop be a special case of the counting loop statement or a
separate statement?

Logically-Controlled Loops: Examples

• C and C++ have both pretest and posttest forms, in which the control expression can be
arithmetic:

while (control_expression)
loop body
and
do
loop body
while (control_expression);
These two statement forms are exemplified by the following C# code segments:

sum = 0;
indat = Int32.Parse(Console.ReadLine());
while (indat >= 0) {
sum += indat;
indat = Int32.Parse(Console.ReadLine());
}
value = Int32.Parse(Console.ReadLine());
do {
value /= 10;
digits ++;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

} while (value > 0);


 Note that all variables in these examples are integer type. The ReadLine method of the
Console object gets a line of text from the keyboard.
 Int32.Parse finds the number in its string parameter, converts it to int type, and returns
it.In both C and C++ it is legal to branch into the body of a logically-controlled loop

• Java is like C and C++, except the control expression must be Boolean (and the body can only
be entered at the beginning – Java has no goto

Logically-Controlled Loops: Examples in F#

• As with counter-controlled loops, logically-controlled loops can be simulated with recursive


functions

let rec whileLoop test body =


if test() then
body()
whileLoop test body
else ()
• This defines the recursive function whileLoop with parameters test and body, both functions.
test defines the control expression

User-Located Loop Control Mechanisms

• Sometimes it is convenient for the programmers to decide a location for loop control (other
than top or bottom of the loop)

• Simple design for single loops (e.g., break)

• Design issues for nested loops

– Should the conditional be part of the exit?

– Should control be transferable out of more than one loop?

• C , C++, Python, Ruby, and C# have unconditional unlabeled exits (break)

• Java and Perl have unconditional labeled exits (break in Java, last in Perl)

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• C, C++, and Python have an unlabeled control statement, continue, that skips the remainder of

the current iteration, but does not exit the loop

• Java and Perl have labeled versions of continue

Following is an example of nested loops in Java, in which there is a break out of the outer loop
from the nested loop:

outerLoop:

for (row = 0; row < numRows; row++)

for (col = 0; col < numCols; col++) {

sum += mat[row][col];

if (sum > 1000.0)

break outerLoop;

C, C++, and Python include an unlabeled control statement, continue, that transfers control to the
control mechanism of the smallest enclosing loop.

For example, consider the following:

while (sum < 1000) {

getnext(value);

if (value < 0) continue;

sum += value;

A negative value causes the assignment statement to be skipped, and control is transferred
instead to the conditional at the top of the loop. On the other hand, in

while (sum < 1000) {

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

getnext(value);

if (value < 0) break;

sum += value;}

a negative value terminates the loop

Iteration Based on Data Structures

A Do statement in Fortran uses a simple iterator over integer values. For example, consider the
following statement:

Do Count = 1, 9, 2

In this statement, 1 is the initial value of Count, 9 is the last value, and the step size between
values is 2. An internal function, the iterator, must be called for each iteration to compute the
next value of Count (by adding 2 to the last value of Count, in this example) and test whether the
iteration should continue.

In Python, this same loop can be written as follows:

for count in range [0, 9, 2]:

In this case, the iterator is named range. While these looping statements are usually used to
iterate over arrays, there is no connection between the iterator and the array.

Ada allows the range of a loop iterator and the subscript range of an array to be connected with
subranges. For example, a subrange can be defined, such as in the following declaration:

subtype MyRange is Integer range 0..99;

MyArray: array (MyRange) of Integer;

for Index in MyRange loop

...

end loop;

The subtype MyRange is used both to declare the array and to iterate through the array. An index
range overflow is not possible when a subrange is used this way.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

PHP

– current points at one element of the array

– next moves current to the next element

– reset moves current to the first element

• Java 5.0 (uses for, although it is called foreach)

• For arrays and any other class that implements the Iterable interface, e.g.,

ArrayList

for (String myElement : myList) { …

• C# and F# (and the other .NET languages) have generic library classes, like Java 5.0 (for
arrays, lists, stacks, and queues). Can iterate over these with the foreach statement. User-defined
collections can implement the IEnumerator interface and also use foreach.

For example, consider the following C# code:

List<String> names = new List<String>();


names.Add("Bob");
names.Add("Carol");
names.Add("Ted");
foreach (Strings name in names)
Console.WriteLine ("Name: {0}",name);

Iteration Based on Data Structures in Ruby

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

In Ruby, a block is a sequence of code, delimited by either braces or the do and end reserved
words. Blocks can be used with specially written methods to create many useful constructs,
including iterators for data structures• Blocks can be used with methods to create Iterators

The following example, which uses a block parameter, illustrates the use of each:

>> list = [2, 4, 6, 8]


=> [2, 4, 6, 8]
>> list.each {|value| puts value}
2
4
6
8
=> [2, 4, 6, 8]
In this example, the block is called for each element of the array to which the each method is
sent. The block produces the output, which is a list of the array’s elements. The return value of
each is the array to which it is sent.

Instead of a counting loop, Ruby has the upto method. For example, we could have the
following:

1.upto(5) {|x| print x, " "}

This produces the following output:

12345

Syntax that resembles a for loop in other languages could also be used,as in the following:

for x in 1..5
print x, " "
end
Ruby actually has no for statement—constructs like the above are converted by Ruby into
upto method calls

Unconditional Branching

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

• Transfers execution control to a specified place in the program

• Represented one of the most heated debates in 1960’s and 1970’s

• Major concern: Readability

• Some languages do not support goto statement (e.g., Java)

• C# offers goto statement (can be used in switch statements)

• Loop exit statements are restricted and somewhat camouflaged goto’s

Guarded Commands

• Designed by Dijkstra

• Purpose: to support a new programming methodology that supported verification (correctness)


during development

• Basis for two linguistic mechanisms for concurrent programming (in CSP and Ada)

• Basic Idea: if the order of evaluation is not important, the program should not specify one

Selection Guarded Command

• Form

if <Boolean expr> -> <statement>


[] <Boolean expr> -> <statement>
...
[] <Boolean expr> -> <statement>
fi
• Semantics: when construct is reached,

– Evaluate all Boolean expressions

– If more than one are true, choose one non-deterministically

– If none are true, it is a runtime error Consider the following example:

if i = 0 -> sum := sum + i

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

[] i > j -> sum := sum + j

[] j > i -> sum := sum + i

fi

If i = 0 and j > i, this statement chooses nondeterministically between the first and third
assignment statements. If i is equal to j and is not zero, a runtime error occurs because none of
the conditions is true.

This statement can be an elegant way of allowing the programmer to state that the order
of execution, in some cases, is irrelevant. For example, to find the largest of two numbers, we
can use

if x >= y -> max := x

[] y >= x -> max := y

fi

This computes the desired result without overspecifying the solution. In particular, if x
and y are equal, it does not matter which we assign to max. This is a form of abstraction
provided by the nondeterministic semantics of the statement

Loop Guarded Command

Now, consider this same process coded in a traditional programming language selector:

if (x >= y)
max = x;
else
max = y;
This could also be coded as follows:

if (x > y)
max = x;
else
max = y;

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)


lOMoARcPSD|35484261

There is no practical difference between these two statements. The first assigns x to max
when x and y are equal; the second assigns y to max in the same circumstance. This choice
between the two statements complicates the formal analysis of the code and the correctness proof
of it. This is one of the reasons why guarded commands were developed by Dijkstra.

The loop structure proposed by Dijkstra has the form

do <Boolean expression> -> <statement>


[] <Boolean expression> -> <statement>
[] . . .
[] <Boolean expression> -> <statement>
od
The semantics of this statement is that all Boolean expressions are evaluated on each
iteration. If more than one is true, one of the associated statements is nondeterministically
(perhaps randomly) chosen for execution, after which the expressions are again evaluated. When
all expressions are simultaneously false, the loop terminates.

CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES

Downloaded by Prakash Jeeva (prakashjeeva433@gmail.com)

You might also like