Professional Documents
Culture Documents
Unit 2
Unit 2
2.1 NAMES
Introduction
Within programming a variety of items are given descriptive names to make the code
more meaningful to us as humans. These names are called “Identifier Names”. Constants,
variables, type definitions, functions, etc. when declared or defined are identified by a name.
These names follow a set of rules that are imposed by:
Imperative programming languages are abstraction of the underlying von Neumann computer
architecture.
e.g. Three dimensional array. Requires software mapping function to support the
abstraction
Name
Address
Value
Type
Scope
Lifetime
Variables, subprograms, labels, user defined types, formal parameters have names.
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES
Design Issues:
Name forms:
• Today “camel” notation is more popular for C-based languages (e.g. myStack)
• In early versions of Fortran – embedded spaces were ignored. e.g. following two names are
equivalent
Sum Of Salaries
SumOfSalaries
Case sensitivity:
In many languages (e.g. C-based languages) uppercase and lowercase letters in names are
distinct
• Problem for readability – names look very similar denote different entities
• Also bad for writability since programmer has to remember the correct cases
e.g. Java method parseInt for converting a string into integer, not ParseInt or parseint
• In C the problem can be avoided by exclusive use of lowercase letters for names
• In Java and C#, many of the predefined names include both uppercase and lowercase letters, so
the problem cannot be escaped
• In Fortran 90, lowercase letters are allowed, and they simply translated to uppercase letters
Special words:
Examples
or
INTEGER REAL
REAL INTEGER
Special words :
A keyword is a word of a programming language that is special only in certain contexts. Fortran
is the only remaining widely used language whose special words are keywords. In Fortran, the
word Integer, when found at the beginning of a statement and followed by a name, is considered
a keyword that indicates the statement is a declarative statement. However, if the word Integer is
followed by the assignment operator, it is considered a variable name. These two uses are
illustrated in the following:
Integer Apple
Integer = 4
Predefined names (have predefined meanings, but can be redefined by the user):
For example, built-in data type names in Pascal, such as INTEGER, normal input/output
subprogram names, such as readln, writeln, are predefined.
2.2 VARIABLES
In object-oriented programming, each object contains the data variables of the class it is
an instance of. The object's method s are designed to handle the actual values that are supplied to
the object when the object is being used. Abstraction of a computer memory cell or collection of
cells
1. Name
2. Address
– A program can have two subprograms sub1 and sub2 each of defines a local variable
that use the same name.
e.g. sum
2. in different times.
Aliases
Multiple identifiers reference the same address – more than one variable are used to
access the same memory location
Such identifier names are called aliases.
Aliases are not good for readability because the value of a variable can be changed by an
assignment to its another name.
can be created explicitly
by EQUIVALENCE statement in FORTRAN
by union types in C and C++
by variant record in Pascal
by subprogram parameters
by pointer variables
3. Type
• Determines the range of values the variable can take, and the set of operators that are defined
for values of this type.
4. Value
2.3 BINDING
A source file has many names whose properties need to be determined. The meaning of
these properties might be determined at different phases of the life cycle of a program. Examples
of such properties include the set of values associated with a type; the type of a variable; the
memory location of the compiled function; the value stored in a variable, and so forth. Binding
is the act of associating properties with names. Binding time is the moment in the
program's life cycle when this association occurs.
Many properties of a programming language are defined during its creation. For instance,
the meaning of key words such as while or for in C, or the size of the integer data type in Java,
are properties defined at language design time. Another important binding phase is the language
implementation time. The size of integers in C, contrary to Java, were not defined when C was
designed. This information is determined by the implementation of the compiler. Therefore, we
say that the size of integers in C is determined at the language implementation time.
If a program uses external libraries, then the address of the external functions will be
known only at link time. It is in this moment that the runtime environment finds where is located
the printf function that a C program calls, for instance. However, the absolute addresses used in
the program will only be known at loading time. At that moment we will have an image of the
executable program in memory, and all the dependences will have been already solved by the
loader.
Finally, there are properties which we will only know once the program executes. The
actual values stored in the variables is perhaps the most important of these properties. In
dynamically typed languages we will only know the types of variables during the execution of
the program. Languages that provide some form of late binding will only lets us know the target
of a function call at runtime, for instance
In general binding is the association of attribute to its entity or operation to its symbol
Binding times
3. Compile time
4. Link time
5. Load time
6. Run time
Example:
count = count + 5
• The meaning of the operator symbol + is bound at compile time, when the types of its operands
have been determined
1. Static: if binding occurs before runtime and remains unchanged throughout the program
execution.
2. Dynamic: if binding occurs during runtime or can change in the course of program execution
Type bindings
Variable declarations
It is a statement in a program that lists variable names and specifies that they are a
particular type
It means of associating variables with types through default conventions, rather than
declaration statements. First appearance of a variable name in a program constitutes its implicit
declaration
Both Declarations creates static binding to types. Most current PLs require explicit
declarations of all variables, Exceptions are Perl, Javascript, ML Languages.
Early languages (Fortran, BASIC) have implicit declarations
e.g. In Fortran, if not explicitly declared, an identifier starting with I,J,K,L,M,N are
implicitly declared to integer, otherwise to real type
Implicit declarations are not good for reliability and writability because misspelled
identifier names cannot be detected by the compiler
e.g. In Fortran variables that are accidentally left undeclared are given default types, and
leads to errors that are difficult to diagnose
Some problems of implicit declarations can be avoided by requiring names for specific
types to begin with a particular special characters
• e.g. In Perl
$apple : scalar
@apple: array
%apple: hash
var sum = 0;
The types of sum, total, and name are int, float, and string, respectively
Type of a variable is not specified by a declaration statement, nor can it be determined by the
spelling of its name
• e.g. In JavaScript
List = 73
Disadvantage:
Example:
I := X
I := Y
is typed. In a dynamic type binding language, this error cannot be detected by the
compiler. I is changed to float during execution. The value of I becomes erroneous.
Disadvantage:
1. Cost:
Type Inference
In ML, the type of an expression and a variable can be determined by the type of a constant in
the expression without requiring the programmer to specify the types of the variables
Examples
Allocation: process of taking the memory cell to which a variable is bound from a pool of
available memory
Deallocation: process of placing the memory cell that has been unbound from a variable back
into the pool of available memory
Lifetime of a variable: Time during the variable is bound to a specific memory location
static,
stack-dynamic,
explicit heap-dynamic,
implicit dynamic.
Static Variables
Static variables are bound to memory cells before execution begins, and remains bound to
the same memory cells until execution terminates.
Applications: globally accessible variables, to make some variables of subprograms to
retain values between separate execution of the subprogram
Such variables are history sensitive.
Advantage: Efficiency. Direct addressing (no run-time overhead for allocation and
deallocation).
Stack-Dynamic Variables
foo ()
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES
static int x; …}
Nameless variables
Storage allocated/deallocated by explicit run-time instructions
can be referenced only through pointer variables
types can be determined at run-time
storage is allocated when created explicitly
Advantages: Required for dynamic structures (e.g., linked lists, trees)
Disadvantages: Difficult to use correctly, costly to refer, allocate, deallocate.
...
In this example, an explicit heap-dynamic variable of int type is created by the new operator.
This variable can then be referenced through the pointer, intnode. Later, the variable is
deallocated by the delete operator. C++ requires the explicit deallocation operator delete, because
it does not use
In Java, all data except the primitive scalars are objects. Java objects are explicitly heap
dynamic and are accessed through reference variables. Java has no way of explicitly
destroying a heap-dynamic variable; rather, implicit garbage collection is used.
C# has both explicit heap-dynamic and stack-dynamic objects, all of which are implicitly
deallocated. In addition, C# supports C++-style pointers. Such pointers are used to
reference heap, stack, and even static variables and objects.
• Storage and type bindings are done when they are assigned values.
Regardless of whether the variable named highs was previously used in the program or what it
was used for, it is now an array of five numeric values.• Advantages: Highest degree of
flexibility
• Disadvantages:
2. 4 TYPE CHECKING
Type checking is the activity of ensuring that the operands of an operator are of
compatible types. A compatible type is one that either is legal for the operator or is allowed
under language rules to be implicitly converted by compiler-generated code (or the interpreter)
to a legal type. This automatic conversion is called a coercion. For example, if an int variable
and a float variable are added in Java, the value of the int variable is coerced to float and a
floating-point add is done.
If type binding is static then all type checking can be done statically by compiler.
Dynamic type binding requires dynamic type checking at run time, e.g. Javascript and
PHP
It is better to detect errors at compile time than at run time because the earlier correction
is usually less costly
However, static checking reduces flexibility
If a memory cell stores values of different types (Ada variant records, Fortran
Equivalance, C and C++ unions) then type checking must be done dynamically at run
time.
So, even though all variables are statically bound to types in languages such as C++, not
all type errors can be detected by static type checking.
Strong typing
A Program Language is a strongly typed language if – each name has a single type, and – type is
known at compile-time.
A better definition:
A Program Language is strongly typed if type errors are always detected (compile time or
run time).
Examples:
– Relationship between actual and formal parameters are not type checked.
- EQUIVALANCE can be declared between different typed names.
– except variant records because they allow omission of the tag field
Example:
In Java the value of an integer operand is coerced to floating point and a floating
operation takes place
• Assume that a and b are int variables. User intended to type a+b but mistakenly typed a + d
where d is a float value. Then the error would not be detected since a would be coerced into
float.
Type compatibility
The most important result of two variables being compatible types is that either one can have its
value assigned to the other
– Two variables have compatible types only if they are in either the same declaration or in
declarations that use the same type name.
Under a strict interpretation a variable whose type is a subrange of the integers would not be
compatible with an integer type variable
Example:
index: indexType;
• The variables count and index are not name type compatible, and cannot be assigned to each
other
• Another problem arises when a structured type is passed among subprograms through
parameters
• A subprogram cannot state the type of such formal parameters in local terms (e.g. In Pascal)
• Two variables have compatible types if their types have identical structure.
• The variables count and index in the previous example, are structure type compatible.
• Under name type compatibility only the two type names must be compared
• Under structure compatibility entire structures of the two types must be compared
• For structures that refer to its own type (e.g. linked lists) this comparison is difficult
fahrenheit = float;
• They are compatible according to structure type compatibility but they may be mixed
– Subtypes
– Derived types
• Derived types : a new type based on some previously defined type with which it is
incompatible. They inherit all the properties of the parent type
• Thee two types are incompatible, although their structures are identical
• They are also incompatible with any other floating point Type
• Subtype: possibly range constrained version of an existing type. A subtype is compatible with
parent type
• Vector 1: vector(1..10)
• Vector 2:vector(11..20)
• These two objects are compatible even though they have different names and different
subscript ranges
• Both types are of type integer, and they both have then elements, therefore they are compatible
A: array(1..10) of integer;
C,D:list_10;
Type compatibility in C
• C uses structure type compatibility for all types except structures and unions
• Every struct and union declaration creates a new type which is not compatible with any other
type
• Note that typedef does not introduce any new type but it defines a new name
2.5 SCOPE
Scope of a variable is the range of statements in which the variable is visible. A variable
is visible in a statement if it can be referenced in that statement.
• The scope rules of a language determine how references to names are associated with variables
Static Scope :
– prior to execution
• To connect a name reference to a variable, you (or the compiler) must find the declaration
Search process:
– search declarations,
•first locally,
In all static-scoped languages (except C), procedures are nested inside the main program.
• In this case all procedures and the main unit create their scopes.
Enclosing static scopes (to a specific scope) are called its static ancestors;
Procedure Big is
x : integer
procedure sub1 is
begin – of sub1
.... x ....
end – of sub1
procedure sub2 is
x: integer;
begin – of sub2
....
end – of sub2
begin – of big
...
end – of big
In some languages that use static scoping, regardless of whether nested subprograms are
allowed, some variable declarations can be hidden from some other code segments
e.g. In C++
void sub1() {
int count;
...
while (...) {
int count;
...
}
...
}
• The reference to count in while loop is local
• Count of sub is hidden from the code inside the while loop
Variables can be hidden from a unit by having a "closer" variable with the same name
– In Ada: unit.name
– In C++: class_name::name
Blocks
• It allows a section of code its own local variables whose scope is minimized.
• The variables are typically stack dynamic so they have their storage allocated when the section
is entered and deallocated when the section is exited
In Ada,
...
declare TEMP: integer;
begin
TEMP := FIRST;
FISRT := SECOND; Block
SECOND := TEMP;
end;
...
C and C++ allow blocks.
C++ allows variable definitions to appear anywhere in functions. The scope is from the
definition statement to the end of the function
In C, all data declarations (except the ones for blocks) must appear at the beginning of the
function
for statements in C++,Java and C# allow variable definitions in their initialization
expression. The scope is restricted to the for construct
Global Scope
Some languages, including C, C++, PHP, JavaScript, and Python, allow a program structure that
is a sequence of function definitions, in which variable definitions can appear outside the
functions. Definitions outside functions in a file create global variables, which potentially can be
visible to those functions.
C and C++ have both declarations and definitions of global data. Declarations specify types and
other attributes but do not cause allocation of storage.
. A global variable that is defined after a function can be made visible in the function by
declaring it to be external, as in the following:
$day = "Monday";
$month = "January";
function calendar() {
$day = "Tuesday";
global $month;
print "local day is $day <br />";
$gday = $GLOBALS['day'];
print "global day is $gday <br \>";
print "global month is $month <br />";
}
calendar();
Dynamic scope
• COMMON LISP and Perl also allows dynamic scope but also uses static scoping
• In dynamic scoping
When the search of a local declaration fails, the declarations of the dynamic parent is searched
Procedure Big is
x : integer
procedure sub1 is
begin – of sub1
.... x ....
end – of sub1
procedure sub2 is
x: integer;
begin – of sub2
....
end – of sub2
begin – of big
...
end – of big
Sometimes the scope and lifetime of a variable appear to be related. For example, consider a
variable that is declared in a Java method that contains no method calls. The scope of such a
variable is from its declaration to the end of the method. The lifetime of that variable is the
period of time beginning when the method is entered and ending when execution of the method
terminates Scope and lifetime are also unrelated when subprogram calls are involved.
void printheader() {
...
} /* end of printheader */
void compute() {
int sum;
...
printheader();
} /* end of compute */
The scope of the variable sum is completely contained within the compute function. It does not
extend to the body of the function printheader, although printheader executes in the midst of the
execution of compute. However, the lifetime of sum extends over the time during which
printheader executes.
Whatever storage location sum is bound to before the call to printheader, that binding will
continue during and after the execution of printheader.
Referencing environments
The referencing environment of a statement is the collection of all names that are visible
in the statement
• In a static-scoped language, it is the local variables plus all of the visible variables in all of the
enclosing scopes
• A subprogram is active if its execution has begun but has not yet terminated
• In a dynamic-scoped language, the referencing environment is the local variables plus all
visible variables in all active subprograms
Consider the following example program. Assume that the only function
calls are the following: main calls sub2, which calls sub1.
void sub1() {
int a, b;
...1
} /* end of sub1 */
void sub2() {
int b, c;
.. . . 2
sub1();
} /* end of sub2 */
void main() {
int c, d;
...3
sub2();
} /* end of main */
The referencing environments of the indicated program points are as follows:
3 c and d of main
A data type defines a collection of data objects and a set of predefined operations on those
objects
• One design issue for all data types: What operations are defined and how are they
specified?
• Primitive data types: Those not defined in terms of other data types
Languages for scientific use support at least two floating-point types (e.g., float and double;
sometimes more
• Some languages support a complex type, e.g., C99, Fortran, and Python
• Each value consists of two floats, the real part and the imaginary part
Most larger computers that are designed to support business systems applications have
hardware support for decimal data types. Decimal data types store a fixed number of decimal
digits, with the decimal point at a fixed position in the value. These are the primary data types for
business data processing and are therefore essential to COBOL. C# and F# also have decimal
data types
Decimal types are stored very much like character strings, using binary codes for the decimal
digits. These representations are called binary coded decimal (BCD).
Simplest of all
Range of values: two elements, one for true and one for false
Could be implemented as bits, but often as bytes .In expressions, all operands with
nonzero values are considered true, and zero is considered false
Boolean types are often used to represent switches or flags in programs
Advantage: readability
A character string type is one in which the values consist of sequences of characters.
character string constants are used to label output, and the input and output of all kinds of
data are often done in terms of strings. Of course, character strings also are an essential
type for all programs that do character manipulation.
Design issues:
Typical operations:
– Catenation
– Pattern matching
If strings are not defined as a primitive type, string data is usually stored in arrays of
single characters and referenced as such in the language. This is the approach taken by C and
C++. C and C++ use char arrays to store character strings. These languages provide a collection
of string operations through standard libraries. Many uses of strings and many of the library
functions use the convention that character strings are terminated with a special character, null,
which is represented with zero
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES
The character string literals that are built by the compiler also have the null character. For
example, consider the following declaration:
In this example, str is an array of char elements, specifically apples0, where 0 is the null
character.
Some of the most commonly used library functions for character strings in C and C++ are strcpy,
which moves strings; strcat, which concatenates one given string onto another; strcmp, which
lexicographically compares (by the order of their character codes) two given strings; and strlen,
which returns the number of characters, not counting the null, in the given string. The parameters
and return values for most of the string manipulation functions are char pointers that point to
arrays of char. Parameters can also be string literals. The string manipulation functions of the C
standard library, which are also available in C++, are inherently unsafe and have led to numerous
programming errors.
1. C and C++
– Not primitive
– Primitive
Python includes strings as a primitive type and has operations for substring reference,
catenation, indexing to access individual characters, as well as methods for searching and
replacement. There is also an operation for character membership in a string. So, even though
Python’s strings are primitive types, for character and substring references, they act very much
like arrays of characters. However, Python strings are immutable, similar to the String class
objects of Java.
4. Java
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES
In Java, strings are supported by the String class, whose values are constant strings, and the
StringBuffer class, whose values are changeable and are more like arrays of single characters.
These values are specified with methods of the StringBuffer class.
Perl, JavaScript, Ruby, and PHP include built-in pattern-matching operations. In these
languages, the pattern-matching expressions are somewhat loosely based on mathematical
regular expressions. In fact, they are often called regular expressions. They evolved from the
early UNIX line editor, ed, to become part of the UNIX shell languages
There are several design choices regarding the length of string values. First, the length
can be static and set when the string is created. Such a string is called a static length string. This
is the choice for the strings of Python, the immutable objects of Java’s String class, as well as
similar classes in the C++ standard class library, Ruby’s built-in String class, and the .NET class
library available to C# and F#.
The second option is to allow strings to have varying length up to a declared and fixed
maximum set by the variable’s definition, as exemplified by the strings in C and the C-style
strings of C++. These are called limited dynamic length strings. Such string variables can store
any number of characters between zero and the maximum
The third option is to allow strings to have varying length with no maximum, as in
JavaScript, Perl, and the standard C++ library. These are called dynamic length strings.
Character string types could be supported directly in hardware; but in most cases,
software is used to implement string storage, retrieval, and manipulation. When character string
types are represented as character arrays, the language often supplies few operations• Static
length: compile-time descriptor
• Limited dynamic length: may need a runtime descriptor for length (but not in C and C++)
• An ordinal type is one in which the range of possible values can be easily associated with the
set of positive integers
– integer
– char
– Boolean
Enumeration Types
An enumeration type is one in which all of the possible values, which are named
constants, are provided, or enumerated, in the definition. Enumeration types provide a way of
defining and grouping collections of named constants, which are called enumeration constants.
• C# example
• Design issues
– Is an enumeration constant allowed to appear in more than one type definition, and if so, how
is the type of an occurrence of that constant checked?
• Is an enumeration constant allowed to appear in more than one type definition, and if so, how is
the type of an occurrence of that constant checked?
Designs:
In languages that do not have enumeration types, programmers usually simulate them with
integer values.
C and Pascal were the first widely used languages to include an enumeration data type. C++
includes C’s enumeration types. In C++, we could have the following:
The colors type uses the default internal values for the enumeration constants, 0, 1, . . . ,
although the constants could have been assigned any integer literal (or any constant-valued
expression). The enumeration values are coerced to int when they are put in integer context. This
allows their use in any numeric expression. For example, if the current value of myColor is blue,
then the expression
myColor++
In ML, enumeration types are defined as new types with datatype declarations. For example, we
could have the following:
F# has enumeration types that are similar to those of ML, except the reserved word type is used
instead of datatype and the first value is preceded by an OR operator (|).
Enumeration types can provide advantages in both readability and reliability. Readability
is enhanced very directly: Named values are easily recognized, whereas coded values are not.–
operations (don’t allow colors to be added)
– Ada, C#, and Java 5.0 provide better support for enumeration than C++ because
enumeration type variables in these languages are not coerced into integer types
SUBRANGE TYPES
• Ada’s design
In these examples, the restriction on the existing types is in the range of possible values. All of
the operations defined for the parent type are also defined for the subtype, except assignment of
values outside the specified range. For example, in
Day1: Days;
Day2: Weekday;
Day2 := Day1;
Subrange Evaluation
• Aid to readability
– Make it clear to the readers that variables of subrange can store only certain range of values
• Reliability
– Assigning a value to a subrange variable that is outside the specified range is detected as an
error
• Subrange types are implemented like the parent types with code inserted (by the compiler) to
restrict assignments to subrange variables
In many languages, such as C, C++, Java, Ada, and C#, all of the elements of an array are
required to be of the same type. In these languages, pointers and references are restricted to point
to or reference a single type. So the objects or data values being pointed to or referenced are also
of a single type. In some other languages, such as JavaScript, Python, and Ruby, variables are
typeless references to objects or data values. In these cases, arrays still consist of elements of a
single type, but the elements can reference objects or data values of different types. Such arrays
are still homogeneous, because the array elements are of the same type.
C# and Java 5.0 provide generic arrays, that is, arrays whose elements are references to
objects, through their class libraries
• Index Syntax
• Ada explicitly uses parentheses to show uniformity between array references and function calls
because both are mappings
• In Ada, the default is to require range checking, but it can be turned off
There are five categories of arrays, based on the binding to subscript ranges, the binding
to storage, and from where the storage is allocated. The category names indicate the design
choices of these three. In the first four of these categories, once the subscript ranges are bound
and the storage is allocated, they remain fixed for the lifetime of the variable
• Static array: subscript ranges are statically bound and storage allocation is static (before
runtime)
• A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but the
allocation is done at declaration elaboration time during execution.
• Stack-dynamic Array: subscript ranges are dynamically bound and the storage allocation is
dynamic (done at run-time)
– Advantage: flexibility (the size of an array need not be known until the array is to be
used)
• Fixed heap-dynamic array: similar to fixed stack dynamic: storage binding is dynamic but
fixed after allocation (i.e., binding is done when requested and storage is allocated from heap,
not stack)
• Heap-dynamic array: binding of subscript ranges and storage allocation is dynamic and can
change any number of times
Get(List_Len);
declare
List : array (1..List_Len) of Integer;
begin
...
end;
In this example, the user inputs the number of desired elements for the array List. The
elements are then dynamically allocated when execution reaches the declare block. When
execution reaches the end of the block, the List array is deallocated.
Array Initialization
• A heterogeneous array is one in which the elements need not be of the same type
Array Initialization
• C-based languages
– int list [] = {1, 3, 5, 7}
– char *names [] = {″Mike″, ″Fred″, ″Mary Lou″};
• Ada
– List: array (1..5) of Integer := (1 => 17, 3 => 34, others => 0);
• Python
– List comprehensions
list = [x ** 2 for x in range(12) if x % 3 == 0]
puts [0, 9, 36, 81] in list
Arrays Operations
• APL provides the most powerful array processing operations for vectors and matrixes as well
as unary operators (for example, to reverse column elements)
• Python’s array assignments, but they are only reference changes. Python also supports array
catenation and element membership operations
• Fortran provides elemental operations because they are between pairs of array elements
– For example, + operator between two arrays results in an array of the sums of the element pairs
of the two arraysEvaluation and Comparison to Arrays
• Access to array elements is much slower than access to record fields, because subscripts
are dynamic (field names are static)
• Dynamic subscripts could be used with record field access, but it would disallow type
checking and it would be much slower
• A rectangular array is a multi-dimensioned array in which all of the rows have the same
number of elements and all columns have the same number of elements
• Fortran, Ada, and C# support rectangular arrays (C# also supports jagged arrays)
Slices
• Slices are only useful in languages that have array operations Slice Examples
• Python
– mat[0][0:2] is the first and second element of the first row of mat
In Ruby
array1 = [1, 2, 3, 4, 5]
array2 = ["a", "b", "c", "d", "e"]
array3 = ["cat", "dog", "cow", "rat", "fox"]
array4 = [true, false, nil]
array5 = ["", "nil", "false", "true"]
Lst =[50,70,30,20,90,10,50]
print(Lst[1:5])
Output:
[70,30,20,90]
Implementation of Arrays
+ ((k-lower_bound) * element_size)
• General format
Compile-Time Descriptors
• An associative array is an unordered collection of data elements that are indexed by an equal
number of values called keys
Design issues:
A record is an aggregate of data elements in which the individual elements are identified by
names and accessed through offsets from the beginning of the structure.
• Design issues:
The fundamental difference between a record and an array is that record elements, or
fields, are not referenced by indices. Instead, the fields are named with identifiers, and references
to the fields are made using these identifiers.
The COBOL form of a record declaration, which is part of the data division of a COBOL
program, is illustrated in the following example:
01 EMPLOYEE-RECORD.
02 EMPLOYEE-NAME.
Ada uses a different syntax for records; rather than using the level numbers of COBOL,
record structures are indicated in an orthogonal way by simply nesting record declarations inside
record declarations. In Ada, records cannot be anonymous—they must be named types. Consider
the following Ada declaration:
In Java and C#, records can be defined as data classes, with nested records defined as
nested classes. Data members of such classes serve as the record fields. As stated previously,
Lua’s associative arrays can be conveniently used as records. For example, consider the
following declaration:
employee.name = "Freddie"
employee.hourlyRate = 13.20
These assignment statements create a table (record) named employee with two elements (fields)
named name and hourlyRate, both initialized.References to Records
1. COBOL
record_name_1.record_name_2. ...record_name_n.field_name
A fully qualified reference to a record field is one in which all intermediate record
names, from the largest enclosing record to the specific field, are named in the reference. Both
the COBOL and the Ada example field references above are fully qualified
Elliptical References.
A fully qualified reference to a record field is one in which all intermediate record names,
from the largest enclosing record to the specific field, are named in the reference. Both the
COBOL and the Ada example field references above are fully qualified. As an alternative to
fully qualified references,
COBOL allows elliptical references to record fields. In an elliptical reference, the field is
named, but any or all of the enclosing record names can be omitted, as long as the resulting
reference is unambiguous in the referencing environment. For example, FIRST, FIRST OF
EMPLOYEE-NAME, and FIRST OF EMPLOYEE-RECORD are elliptical references to the
employee’s first name in the COBOL record declared above. Although elliptical references are a
programmer convenience, they require a compiler to have elaborate data structures and
procedures in order to correctly identify the referenced field. They are also somewhat detrimental
to readability.
Operations on Records
– Copies a field of the source record to the corresponding field in the target record
• Access to array elements is much slower than access to record fields, because subscripts are
dynamic (field names are static)
• Dynamic subscripts could be used with record field access, but it would disallow type checking
and it would be much slower
The fields of records are stored in adjacent memory locations. But because the sizes of
the fields are not necessarily the same, the access method used for arrays is not used for records.
Instead, the offset address, relative to the beginning of the record, is associated with each field.
Tuple Types
• A tuple is a data type that is similar to a record, except that the elements are not named
– Python
Referenced with subscripts (begin at 1) concatenation with + and deleted with del
Tuple Types in ML
Given
Access as follows:
Tuple Types in F#
let a, b, c = tup
List Types
• Lists in LISP and Scheme are delimited by parentheses and use no commas
(A B C D) and (A (B C) D)
• The interpreter needs to know which a list is, so if it is data, we quote it with an apostrophe
′(A B C) is data
• CDR returns the remainder of its list parameter after the first element has been removed
• CONS puts its first parameter into its second parameter, a list, to make a new list
List Operations in ML
• Lists are written in brackets and the elements are separated by commas
• The Scheme CAR and CDR functions are named hd and tl, respectively
Lists n F# and ML
• F# Lists
– Like those of ML, except elements are separated by semicolons and hd and tl are
methods of the List class
• Python Lists
– Unlike Scheme, Common LISP, ML, and F#, Python’s lists are mutable
Lists in Python
• List elements are referenced with subscripting, with indices beginning at zero
• Python includes a powerful mechanism for creating arrays called list comprehensions. A list
comprehension is an idea derived from set notation. It first appeared in the functional
programming language Haskell.
The mechanics of a list comprehension is that a function is applied to each of the elements of a
given array and a new array is constructed from the results.
The syntax of a Python list comprehension is as follows: List Comprehensions – derived from set
notation
– The original
[n * n | n <- [1..10]]
• Both C# and Java supports lists through their generic heap-dynamic collection classes, List and
ArrayList, respectively
• A union is a type whose variables are allowed to store different type values at different times
during execution
• Design issues
• Fortran, C, and C++ provide union constructs in which there is no language support for type
checking; the union in these languages is called free union
union flexType {
int intEl;
float floatEl;
};
union flexType el1;
float x;
...
el1.intEl = 27;
x = el1.floatEl;
This last assignment is not type checked, because the system cannot determine the current
type of the current value of el1, so it assigns the bit string representation of 27 to the float
variable x, which of course is nonsense.
• Type checking of unions require that each union include a type indicator called a discriminant,
and a union with a discriminant is called a discriminated union
– Supported by Ada
Unions in F#
A union is declared in F# with a type statement using OR operators (|) to define the components.
For example, we could have the following:
type intReal =
| IntValue of int
| RealValue of float;;
In this example, intReal is the union type. IntValue and RealValue are constructors. Values of
type intReal can be created using the constructors as if they were a function, as in the following
examples:
Implementation of Unions
Unions are implemented by simply using the same address for every possible variant.
Sufficient storage for the largest variant is allocated. The tag of a discriminated union is stored
with the variant in a recordlike structure.
At compile time, the complete description of each variant must be stored. This can be
done by associating a case table with the tag entry in the descriptor.
The case table has an entry for each variant, which points to a descriptor for that
particular variant. To illustrate this arrangement, consider the following
Ada example:
type Node (Tag: Boolean) is
record
case Tag is
when True => Count : Integer;
when False => Sum : Float;
end case;
end record;
The descriptor for this type could have the form shown in Figure
A compile-time descriptor for a discriminated union.
Evaluation of Unions
A pointer type is one in which the variables have a range of values that consists of
memory addresses and a special value, nil. The value nil is not a valid address and is used to
indicate that a pointer cannot currently be used to reference a memory cell.
Pointers are designed for two distinct kinds of uses. First, pointers provide some of the
power of indirect addressing, which is frequently used in assembly language programming.
Second, pointers provide a way to manage dynamic storage. A pointer can be used to access a
location in an area where storage is dynamically allocated called a heap• Provide the power of
indirect addressing
Variables that are dynamically allocated from the heap are called heapdynamic variables. They
often do not have identifiers associated with them and thus can be referenced only by pointer or
reference type variables. Variables without names are called anonymous variables
Pointers, unlike arrays and records, are not structured types, although they are defined
using a type operator (* in C and C++ and access in Ada). Furthermore, they are also different
from scalar variables because they are used to reference some other variable, rather than being
used to store data.
These two categories of variables are called reference types and value types, respectively.
• A pointer can be used to access a location in the area where storage is dynamically created
(usually called a heap)
• Are pointers restricted as to the type of value to which they can point?
• Are pointers used for dynamic storage management, indirect addressing, or both?
Pointer Operations
• Dangling pointers
– A pointer points to a heap-dynamic variable that has been deallocated
• Lost heap-dynamic variable
– An allocated heap-dynamic variable that is no longer accessible to the user program
(often called garbage)
• Some dangling pointers are disallowed because dynamic objects can be automatically
deallocated at the end of pointer's type scope
• The lost heap-dynamic variable problem is not eliminated by Ada (possible with
UNCHECKED_DEALLOCATION)
• Pointers can point at any variable regardless of when or where it was allocated
void * can point to any type and can be type checked (cannot be de-referenced)
float stuff[100];
float *p;
p = stuff;
*(p+5) is equivalent to stuff[5] and p[5]
*(p+i) is equivalent to stuff[i] and p[i]
Reference Types
A reference type variable is similar to a pointer, with one important and fundamental
difference: A pointer refers to an address in memory, while a reference refers to an object or a
• C++ includes a special kind of pointer type called a reference type that is used primarily for
formal parameters
• Java extends C++’s reference variables and allows them to replace pointers entirely
Evaluation of Pointers
• Pointers are like goto's--they widen the range of cells that can be accessed by a variable
• Pointers or references are necessary for dynamic data structures--so we can't design a
language without them
Representations of Pointers
Tombstone
• Locks-and-keys use pointer values that are represented as (key, address) pairs
• Heap-dynamic variables are represented as variable plus cell for integer lock value
• When heap-dynamic variable allocated, lock value is created and placed in lock cell and key
cell of pointer
Heap Management
• Reference counters: maintain a counter in every cell that store the number of pointers currently
pointing at the cell
– Disadvantages: space required, execution time required, complications for cells
connected circularly
– Advantage: it is intrinsically incremental, so significant delays in the application
execution are avoided
Mark-Sweep
The run-time system allocates storage cells as requested and disconnects pointers from
cells as necessary; mark-sweep then begins
– All pointers traced into heap, and reachable cells marked as not garbage
Disadvantages of Mark-Sweep
Marking Algorithm
Variable-Size Cells
• The initial setting of the indicators of all cells in the heap is difficult
Type Checking
• Type checking is the activity of ensuring that the operands of an operator are of compatible
types
• A compatible type is one that is either legal for the operator, or is allowed under language rules
to be implicitly converted, by compiler- generated code, to a legal type
• Generalize the concept of operands and operators to include subprograms and assignments
For example, if an int variable and a float variable are added in Java, the value of the int
variable is coerced to float and a floating-point add is done.
• If all type bindings are static, nearly all type checking can be static
• If type bindings are dynamic, type checking must be dynamic. Dynamic type binding requires
type checking at run time, which is called dynamic type checking.
Some languages, such as JavaScript and PHP, because of their dynamic type binding,
allow only dynamic type checking. It is better to detect errors at compile time than at run time
• Advantage of strong typing: allows the detection of the misuses of variables that result in type
errors
A programming language is strongly typed if type errors are always detected. This
requires that the types of all operands can be determined, either at compile time or at run time
C and C++ are not strongly typed languages because both include union types, which are
not type checked.ML is strongly typed, even though the types of some function
parameters may not be known at compile time. F# is strongly typed.
Java and C#, although they are based on C++, are strongly typed in the same sense as
Ada
• Name type equivalence means the two variables have equivalent types if they are in either the
same declaration or in declarations that use the same type name
• Formal parameters must be the same type as their corresponding actual parametersThere are
two approaches to defining type equivalence: name type equivalence and structure type
equivalence. Name type equivalence means that two variables have equivalent types if they are
defined either in the same declaration or in declarations that use the same type name. Structure
type equivalence means that two variables have equivalent types if their types have identical
structures. There are some variations of these two approaches, and many languages use
combinations of them.
• Structure type equivalence means that two variables have equivalent types if their types have
identical structures
Fahrenheit = Float;
The types of variables of these two types are considered equivalent under structure type
equivalence, allowing them to be mixed in expressions, which is surely undesirable in this case,
considering the difference indicated by the type’s names.
A derived type is a new type that is based on some previously defined type with which it is not
equivalent, although it may have identical structure. Derived types inherit all the properties of
their parent types.Consider the following example:
The types of variables of these two derived types are not equivalent, although their structures are
identical
– Are two enumeration types equivalent if their components are spelled differently?
– With structural type equivalence, you cannot differentiate between types of the same
structure (e.g. different units of speed, both float)
Note that Ada’s derived types are very different from Ada’s subrange types. For example,
consider the following type declarations:
Variables of both types, Derived_Small_Int and Subrange_Small_Int, have the same range of
legal values and both inherit the operations of Integer.
For variables of an Ada unconstrained array type, structure type equivalence is used. For
example, consider the following type declaration and two object declarations:
The types of these two objects are equivalent, even though they have different names and
different subscript ranges, because for objects of unconstrained array types, structure type
equivalence rather than name type equivalence is used
• Type theory is a broad area of study in mathematics, logic, computer science, and philosophy
• In computer science there are two branches of type theory: practical and abstract. The practical
branch is concerned with data types in commercial programming languages; the abstract branch
primarily focuses on typed lambda calculus, an area of extensive research by theoretical
computer scientists over the past half century
• A type system is a set of types and the rules that govern their use in programs
• Formal model of a type system is a set of types and a collection of functions that define the
type rules
– Either an attribute grammar or a type map could be used for the functions
• To understand expression evaluation, need to be familiar with the orders of operator and
operand evaluation
Arithmetic Expressions
• Arithmetic evaluation was one of the motivations for the development of the first
programming languages
In most programming languages, binary operators are infix, which means they appear
between their operands. One exception is Perl, which has some operators that are prefix,
which means they precede their operands.
fetching the operands, usually from memory, and executing arithmetic operations on those
operands
• operator overloading
• The operator precedence rules for expression evaluation define the order in which
“adjacent” operators of different precedence levels are evaluated
– parentheses
– unary operators
– *, /
– +, -
The operator associativity rules for expression evaluation define the order in which
adjacent operators with the same precedence level are evaluated
APL is different; all operators have equal precedence and all operators associate right to left
• Conditional Expressions
– An example:
if (count == 0) average = 0
Functional side effects: when a function changes a two-way parameter or a non-local variable
a = 10;
b = a + fun(a);
In all of the common imperative languages, the unary minus operator can appear in an expression
either at the beginning or anywhere inside the expression, as long as it is parenthesized to
prevent it from being next to another operator. For example,
a + (- b) * c
is legal, but
a+-b*c
usually is not.
-a/b
-a*b
- a ** b
In the first two cases, the relative precedence of the unary minus operator and the binary operator
is irrelevant—the order of evaluation of the two operators has no effect on the value of the
expression
Of the common programming languages, only Fortran, Ruby, Visual Basic, and Ada have
the exponentiation operator. In all four, exponentiation has higher precedence than unary minus,
so
- A ** B
is equivalent to
- (A ** B)
The precedences of the arithmetic operators of Ruby and the C-based languages are as follows:
Associativity:
When an expression contains two adjacent 2 occurrences of operators with the same level
of precedence, the question of which operator is evaluated first is answered by the associativity
rules of the language. An operator can have either left or right associativity, meaning that when
there are two adjacent operators with the same precedence, the left operator is evaluated first or
the right operator is evaluated first, respectively
Java expression
a-b+c
the left operator is evaluated first.
Exponentiation in Fortran and Ruby is right associative, so in the expression
A ** B ** C
the right operator is evaluated first.
In Ada, exponentiation is non associative, which means that the expression
A ** B ** C
is illegal. Such an expression must be parenthesized to show the desired order, as in either
(A ** B) ** C
or
A ** (B ** C)
Parentheses:
(A + B) * C
Expressions in LISP:
As is the case with Ruby, all arithmetic and logic operations in LISP are performed by
subprograms. But in LISP, the subprograms must be explicitly called. For example, to specify
the C expression a + b * c in LISP, one must write the following expression:3
(+ a (* b c))
Conditional Expressions
if-then-else statements can be used to perform a conditional expression assignment. For example,
consider
if (count == 0)
average = 0;
else
average = sum / count;
In the C-based languages, this code can be specified more conveniently in an assignment
statement using a conditional expression, which has the form
otherwise, it is the value of expression_3. For example, the effect of the example if-then-
else can be achieved with the following assignment statement, using a conditional expression:
In effect, the question mark denotes the beginning of the then clause, and the colon marks the
beginning of the else clause. Both clauses are mandatory
A side effect of a function, naturally called a functional side effect, occurs when the
function changes either one of its parameters or a global variable. (A global variable is declared
outside the function but is accessible in the function.)
The following C program illustrates the same problem when a function changes a global variable
that appears in an expression:
int a = 5;
int fun1() {
a = 17;
return 3;
CCS358-PRINCIPLES OF PROGRAMMING LANGUAGES
} /* end of fun1 */
void main() {
a = a + fun1();
} /* end of main */
The value computed for a in main depends on the order of evaluation of the operands in
the expression a + fun1(). The value of a will be either 8 (if a is evaluated first) or 20 (if the
function call is evaluated first).
• Use of an operator for more than one purpose is called operator overloading
– Can be avoided by introduction of new symbols (e.g., Pascal’s div for integer
division)
• Potential problems:
• A narrowing conversion is one that converts an object to a type that cannot include all of
the values of the original type
• A widening conversion is one in which an object is converted to a type that can include at
least approximations to all of the values of the original type.
• Disadvantage of coercions:
int a;
float b, c, d;
...
d = b * a;
Assume that the second operand of the multiplication operator was supposed to be c, but
because of a keying error it was typed as a. Because mixed-mode expressions are legal in Java,
the compiler would not detect this as an error. It would simply insert code to coerce the value of
the int operand, a, to float.
If mixed-mode expressions were not legal in Java, this keying error would have been detected by
the compiler as a type error.
• In most languages, all numeric types are coerced in expressions, using widening
conversions
In Ada, there are virtually no coercions in expressions If the Java code example were written
in Ada, as in
A : Integer;
B, C, D : Float;
...
C := B * A;
then the Ada compiler would find the expression erroneous, because Float and Integer
operands cannot be mixed for the * operator
The C-based languages have integer types that are smaller than the int type. In Java, they
are byte and short. Operands of all of these types are coerced to int whenever virtually any
operator is applied to them. So, while data can be stored in variables of these types, it cannot be
manipulated before conversion to a larger type. For example, consider the following Java code:
byte a, b, c;
...
a = b + c;
The values of b and c are coerced to int and an int addition is performed. Then, the sum is
converted to byte and put in a
• Most languages provide some capability for doing explicit conversions, both widening
and narrowing
• Explicit type conversion is Called casting in C-based language
• Examples
– C: (int) angle
• Causes
• Relational Expressions
– A relational operator is an operator that compares the values of its two operands.
A relational expression has two operands and one relational operator. The value
of a relational expression is Boolean, except when Boolean is not a type included
in the language. Operator symbols used vary somewhat among languages (!=, /=,
.NE., <>, #)
The syntax of the relational operators for equality and inequality differs among some
programming languages. For example, for inequality,
= = = and !==. These are similar to their relatives, == and !=, but prevent their operands from
being coerced. For example, the expression
"7" == 7
is true in JavaScript, because when a string and a number are the operands of a relational
operator, the string is coerced to a number. However,
"7" === 7
Ruby uses == for the equality relational operator that uses coercions, and eql? for equality with
no coercions. Ruby uses === only in the when clause of its case statement,
The relational operators always have lower precedence than the arithmetic operators, so that in
expressions such as
a+1>2*b
Boolean Expressions
.OR. or || or
xor
No Boolean Type in C
• C has no Boolean type--it uses int type with 0 for false and nonzero for true
• One odd characteristic of C’s expressions: a < b < c is a legal expression, but the
result is not what you might expect:
– The evaluation result is then compared with the third operand (i.e., c)
postfix ++, --
unary +, -, prefix ++, --, !
*,/,%
binary +, -
<, >, <=, >=
=, !=
&&
||
• An expression in which the result is determined without evaluating all of the operands
and/or operators
index = 1;
index++;
• C, C++, and Java: use short-circuit evaluation for the usual Boolean operators (&& and
||), but also provide bitwise Boolean operators that are not short circuit (& and |)
• Ada: programmer can specify either (short-circuit is specified with and then and or else)
= can be bad when it is overloaded for the relational operator for equality
Which is equivalent to
if (flag)
total = 0
else
subtotal = 0
Compound Assignment Operators
• Example
a=a+b
is written as
a += b
For example,
sum += value;
is equivalent to
sum = sum + value;
• Examples
sum = ++ count;
the value of count is incremented by 1 and then assigned to sum. This operation could also be
stated as
count = count + 1;
sum = count;
If the same operator is used as a postfix operator, as in
the assignment of the value of count to sum occurs first; then count is incremented. The effect is
the same as that of the two statements
sum = count;
count = count + 1;
An example of the use of the unary increment operator to form a complete assignment statement
is
count ++;
which simply increments count. It does not look like an assignment, but it certainly is one. It is
equivalent to the statement
count = count + 1;
When two unary operators apply to the same operand, the association is right to left. For
example, in
- count ++
- (count ++)
Assignment as an Expression
• In C, C++, and Java, the assignment statement produces a result and can be used as
operands
• An example:
ch = getchar() is carried out; the result (assigned to ch) is used as a conditional value for
the while statement
Note that the treatment of the assignment operator as any other binary operator allows the effect
of multiple-target assignments, such as
sum = count = 0;
in which count is first assigned the zero, and then count’s value is assigned to sum. This form of
multiple-target assignments is also legal in Python
Multiple Assignments
Several recent programming languages, including Perl, Ruby, and Lua, provide multiple-target,
multiple-source assignment statements. For example, in Perl one can write
This correctly interchanges the values of $first and $second, without the use of a
temporary variable (at least one created and managed by the programmer).
The syntax of the simplest form of Ruby’s multiple assignment is similar to that of Perl,
except the left and right sides are not parenthesized
int a, b;
float c;
c = a / b;
• In Pascal, integer variables can be assigned to real variables, but real variables cannot be
assigned to integers
Control Structures are just a way to specify flow of control in programs. Any algorithm or
program can be clearer and understood if they use self-contained modules called as logic or
control structures. It basically analyzes and chooses in which direction a program flows based
on certain parameters or conditions. There are three basic types of logic, or flow of control,
known as:
1. Sequence logic, or sequential flow
2. Selection logic, or conditional flow
3. Iteration logic, or repetitive flow
A control structure is a control statement and the collection of statements whose execution it
controls.
– One important result: It was proven that all algorithms represented by flowcharts can be
coded with only two-way selection and pretest logical loops
• A control structure is a control statement and the statements whose execution it controls
Selection Statements
• A selection statement provides the means of choosing between two or more paths of execution
– Two-way selectors
– Multiple-way selectors
• General form:
if control_expression
then clause
else clause
• Design Issues:
• If the then reserved word or some other syntactic marker is not used to introduce the then
clause, the control expression is placed in parentheses
• In C89, C99, Python, and C++, the control expression can be arithmetic
Clause Form
• In many contemporary languages, the then and else clauses can be single statements or
compound statements
• In Fortran 95, Ada, Python, and Ruby, clauses are statement sequences
if x > y :
x=y
print " x was greater than y
Nesting Selectors
Java example
if (sum == 0)
if (count == 0)
result = 0;
else result = 1;
• Which if gets the else?
• Java's static semantics rule: else matches with the nearest previous if
if (sum == 0) {
if (count == 0)
result = 0;
}
else result = 1;
• The above solution is used in C, C++, and C#
Ruby
if sum == 0 then
if count == 0 then
result = 0
else
result = 1
end
end
Python
if sum == 0 :
if count == 0 :
result = 0
else :
result = 1
Selector Expressions
In the functional languages ML, F#, and LISP, the selector is not a statement; it is an expression
that results in a value. Therefore, it can appear anywhere any other expression can appear.
Consider the following example selector written in F#:
let y =
if x > 0 then x
else 2 * x;;
This creates the name y and sets it to either x or 2 * x, depending on whether x is greater than
zero– If the if expression returns a value, there must be an else clause (the expression could
produce output, rather than a value)
The multiple-selection statement allows the selection of one of any number of statements or
statement groups
• Design Issues:
3. Is execution flow through the structure restricted to include just a single selectable segment?
switch (expression) {
case const_expr1: stmt1;
…
case const_exprn: stmtn;
[default: stmtn+1]
}
3. Any number of segments can be executed in one execution of the construct (there is no
implicit branch at the end of selectable segments)
4. default clause is for unrepresented values (if there is no default, the whole statement does
nothing)
switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
case 2:
case 4: even += 1;
sumeven += index;
default: printf("Error in switch, index = %d\n", index);
}
This code prints the error message on every execution. Likewise, the code for the 2 and 4
constants is executed every time the code at the 1 or 3 constants is executed. To separate these
segments logically, an explicit branch must be included. The break statement, which is actually a
restricted goto, is normally used for exiting switch statements.
The following switch statement uses break to restrict each execution to a single selectable
segment:
switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
break;
case 2:
case 4: even += 1;
sumeven += index;
break;
default: printf("Error in switch, index = %d\n", index);
}
Occasionally, it is convenient to allow control to flow from one selectable code segment
to another. For example, in the example above, the segments for the case values 1 and 2 are
empty, allowing control to flow to the segments for 3 and 4, respectively
C#
– Differs from C in that it has a static semantics rule that disallows the implicit execution of
more than one segment
–For example,
switch (value) {
case -1:
Negatives++;
break;
case 0:
Zeros++;
goto case 1;
case 1:
Positives++;
default:
Console.WriteLine("Error in switch \n");
}
Note that Console.WriteLine is the method for displaying strings in C# Each selectable segment
must end with an unconditional branch (goto or break)
– Also, in C# the control expression and the case constants can be strings
• Ruby has two forms of multiple-selection constructs, both of which are called case expressions
and both of which yield the value of the last expression evaluated. The only version of Ruby’s
case expressions that is described here is semantically similar to a list of nested if statements:
case
when Boolean_expression then expression
...
when Boolean_expression then expression
[else expression]
end
The semantics of this case expression is that the Boolean expressions are evaluated one at a time,
top to bottom. The value of the case expression is the value of the first then expression whose
Boolean expression is true. The else represents true in this statement, and the else clause is
optional. For example,
leap = case
when year % 400 == 0 then true
when year % 100 == 0 then false
else year % 4 == 0
end
This case expression evaluates to true if year is a leap year
• Approaches:
– Store case values in a table and use a linear search of the table
– When there are more than ten cases, a hash table of case values can be used
– If the number of cases is small and more than half of the whole range of case values are
represented, an array whose indices are the case values and whose values are the case labels can
be used
switch (expression) {
case constant_expression1: statement1;
break;
...
case constantn: statementn;
break;
[default: statementn+1]
}
One simple translation of this statement follows: Code to evaluate expression into t
goto branches
label1: code for statement1
goto out
...
labeln: code for statementn
goto out
default: code for statementn+1
goto out
branches: if t = constant_expression1 goto label1
...
if t = constant_expressionn goto labeln
goto default
out:
The code for the selectable segments precedes the branches so that the targets of the branches are
all known when the branches are generated
• Multiple Selectors can appear as direct extensions to two-way selectors, using elseif clauses,
for example in Python:
if count < 10 :
bag1 = True
elif count < 100 :
bag2 = True
elif count < 1000 :
bag3 = True
case
when count < 10 then bag1 = true
when count < 100 then bag2 = true
when count < 1000 then bag3 = true
end
(COND
(predicate1 expression1)
(predicate2 expression2)
…
(predicaten expressionn)
[(ELSE expressionn+1)]
)
• The else clause is optional; else is a synonym for true
• Semantics: The value of the evaluation of cond is the value of the expression associated with
the first predicate expression that is true
(COND
((> x y) "x is greater than y")
((< x y) "y is greater than x")
(ELSE "x and y are equal")
)
Note that string literals evaluate to themselves, so that when this call to COND is evaluated, it
produces a string result
ITERATIVE STATEMENTS
Counter-Controlled Loops
A counting iterative control statement has a variable, called the loop variable, in which the count
value is maintained. It also includes some means of specifying the initial and terminal values of
the loop variable, and the difference between sequential loop variable values, often called the
stepsize. The initial, terminal, and stepsize specifications of a loop are called the loop
parameters.
• Design Issues:
2. Should it be legal for the loop variable or loop parameters to be changed in the loop body, and
if so, does the change affect loop control?
3. Should the loop parameters be evaluated only once, or once for every iteration?
...
end loop;
The most interesting new feature of the Ada for statement is the scope of the loop variable,
which is the range of the loop. The variable is implicitly declared at the for statement and
implicitly undeclared after loop termination.
For example, in
C-based languages
loop body
The loop body can be a single statement, a compound statement, or a null statement.
expression_1
loop:
if expression_2 = 0 goto out
[loop body]
expression_3
goto loop
out: . . .
Following is an example of a skeletal C for statement:
for (count = 1; count <= 10; count++)
...
}
Example:
The scope of a variable defined in the for statement is from its definition to the end of the loop
body.
• Java and C#
print count
produces
2
4
6
For most simple counting loops in Python, the range function is used. range takes one, two, or
three parameters. The following examples demonstrate the actions of range:
Note that range never returns the highest value in a given parameter range.
• Because counters require variables, and functional languages do not have variables, counter-
controlled loops must be simulated with recursive functions
Logically-Controlled Loops
• Design issues:
– Pretest or posttest?
– Should the logically controlled loop be a special case of the counting loop statement or a
separate statement?
• C and C++ have both pretest and posttest forms, in which the control expression can be
arithmetic:
while (control_expression)
loop body
and
do
loop body
while (control_expression);
These two statement forms are exemplified by the following C# code segments:
sum = 0;
indat = Int32.Parse(Console.ReadLine());
while (indat >= 0) {
sum += indat;
indat = Int32.Parse(Console.ReadLine());
}
value = Int32.Parse(Console.ReadLine());
do {
value /= 10;
digits ++;
• Java is like C and C++, except the control expression must be Boolean (and the body can only
be entered at the beginning – Java has no goto
• Sometimes it is convenient for the programmers to decide a location for loop control (other
than top or bottom of the loop)
• Java and Perl have unconditional labeled exits (break in Java, last in Perl)
• C, C++, and Python have an unlabeled control statement, continue, that skips the remainder of
Following is an example of nested loops in Java, in which there is a break out of the outer loop
from the nested loop:
outerLoop:
sum += mat[row][col];
break outerLoop;
C, C++, and Python include an unlabeled control statement, continue, that transfers control to the
control mechanism of the smallest enclosing loop.
getnext(value);
sum += value;
A negative value causes the assignment statement to be skipped, and control is transferred
instead to the conditional at the top of the loop. On the other hand, in
getnext(value);
sum += value;}
A Do statement in Fortran uses a simple iterator over integer values. For example, consider the
following statement:
Do Count = 1, 9, 2
In this statement, 1 is the initial value of Count, 9 is the last value, and the step size between
values is 2. An internal function, the iterator, must be called for each iteration to compute the
next value of Count (by adding 2 to the last value of Count, in this example) and test whether the
iteration should continue.
In this case, the iterator is named range. While these looping statements are usually used to
iterate over arrays, there is no connection between the iterator and the array.
Ada allows the range of a loop iterator and the subscript range of an array to be connected with
subranges. For example, a subrange can be defined, such as in the following declaration:
...
end loop;
The subtype MyRange is used both to declare the array and to iterate through the array. An index
range overflow is not possible when a subrange is used this way.
PHP
• For arrays and any other class that implements the Iterable interface, e.g.,
ArrayList
• C# and F# (and the other .NET languages) have generic library classes, like Java 5.0 (for
arrays, lists, stacks, and queues). Can iterate over these with the foreach statement. User-defined
collections can implement the IEnumerator interface and also use foreach.
In Ruby, a block is a sequence of code, delimited by either braces or the do and end reserved
words. Blocks can be used with specially written methods to create many useful constructs,
including iterators for data structures• Blocks can be used with methods to create Iterators
The following example, which uses a block parameter, illustrates the use of each:
Instead of a counting loop, Ruby has the upto method. For example, we could have the
following:
12345
Syntax that resembles a for loop in other languages could also be used,as in the following:
for x in 1..5
print x, " "
end
Ruby actually has no for statement—constructs like the above are converted by Ruby into
upto method calls
Unconditional Branching
Guarded Commands
• Designed by Dijkstra
• Basis for two linguistic mechanisms for concurrent programming (in CSP and Ada)
• Basic Idea: if the order of evaluation is not important, the program should not specify one
• Form
fi
If i = 0 and j > i, this statement chooses nondeterministically between the first and third
assignment statements. If i is equal to j and is not zero, a runtime error occurs because none of
the conditions is true.
This statement can be an elegant way of allowing the programmer to state that the order
of execution, in some cases, is irrelevant. For example, to find the largest of two numbers, we
can use
fi
This computes the desired result without overspecifying the solution. In particular, if x
and y are equal, it does not matter which we assign to max. This is a form of abstraction
provided by the nondeterministic semantics of the statement
Now, consider this same process coded in a traditional programming language selector:
if (x >= y)
max = x;
else
max = y;
This could also be coded as follows:
if (x > y)
max = x;
else
max = y;
There is no practical difference between these two statements. The first assigns x to max
when x and y are equal; the second assigns y to max in the same circumstance. This choice
between the two statements complicates the formal analysis of the code and the correctness proof
of it. This is one of the reasons why guarded commands were developed by Dijkstra.