Software Emgg Using C++

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 116

Department of Computing Science

Faculty of Computing & Engineering

Software Engineering
using C++

Lecture Notes

Prepared by Terry Chapman

September 1999
Table of Contents

Table of Contents
BASIC C++ .....................................................................................................................1
1. A First C++ Program.......................................................................................................................... 1
2. Data Types.......................................................................................................................................... 1
3. String Constants.................................................................................................................................. 2
4. Variables and Constants ..................................................................................................................... 3
5. Arithmetic Operators .......................................................................................................................... 4
6. Type conversions................................................................................................................................ 5
7. Assignment operator........................................................................................................................... 5
8. The compound assignment operators ................................................................................................. 5
9. The increment & decrement operators ............................................................................................... 5
10. Iostream library .................................................................................................................................. 6
11. Command line redirection .................................................................................................................. 6
12. Streams ............................................................................................................................................... 6
13. Output manipulators ........................................................................................................................... 7
14. Relational operators and expressions.................................................................................................. 8
15. FALSE and TRUE.............................................................................................................................. 8
16. Logical operators and expressions...................................................................................................... 9
17. Short-circuit evaluation ...................................................................................................................... 9
18. The while statement.......................................................................................................................... 10
19. The if statement ................................................................................................................................ 11
20. Style for logical expressions............................................................................................................. 12
21. The ctype library............................................................................................................................... 12

FUNCTIONS .................................................................................................................13
1. Introduction ...................................................................................................................................... 13
2. Input and output in functions............................................................................................................ 15
3. Multi-function programs................................................................................................................... 15
4. Stepwise Refinement (or Top-down design) .................................................................................... 16
5. Automatic variables.......................................................................................................................... 17
6. Function values................................................................................................................................. 17
7. Function arguments .......................................................................................................................... 17
8. Function argument agreement & conversion.................................................................................... 18
9. Overloaded function names .............................................................................................................. 18
10. Reference Arguments ....................................................................................................................... 19
11. Function comments .......................................................................................................................... 19
12. Summary .......................................................................................................................................... 19

FLOW OF CONTROL ...................................................................................................21


1. The type cast operator ...................................................................................................................... 21
2. The comma operator......................................................................................................................... 21
3. The conditional operator................................................................................................................... 21
4. The for statement.............................................................................................................................. 22
5. The do statement............................................................................................................................... 23
6. Nested loops ..................................................................................................................................... 23
7. The break statement ......................................................................................................................... 24
8. The continue statement..................................................................................................................... 24
9. The switch statement ........................................................................................................................ 24

POINTERS, REFERENCES AND FUNCTIONS............................................................27


1. Introduction ...................................................................................................................................... 27
2. Reference Type................................................................................................................................. 27
3. Pointers v References ....................................................................................................................... 28
4. Enumeration Types........................................................................................................................... 29
5. The typedef statement....................................................................................................................... 29
6. Reference arguments to functions .................................................................................................... 29
7. Pointer arguments to functions ......................................................................................................... 30
8. Default arguments ............................................................................................................................ 32
9. Inline functions................................................................................................................................. 32
10. Mathematical functions .................................................................................................................... 33

i
Table of Contents

ARRAYS .......................................................................................................................35
1. Introduction.......................................................................................................................................35
2. Defining and referencing arrays........................................................................................................35
3. Array initialisation ............................................................................................................................37
4. Multi-dimensional arrays ..................................................................................................................38
5. Arrays as function arguments ...........................................................................................................38
6. Pointers and arrays............................................................................................................................39
7. Character strings and variable pointers .............................................................................................40
8. Character string input/output ............................................................................................................40
9. Arrays of pointers and pointers to pointers .......................................................................................41
10. Command line arguments .................................................................................................................42
11. Initialising pointer arrays ..................................................................................................................43
12. Review ..............................................................................................................................................43
13. Summary...........................................................................................................................................44
14. An array application - Stack of char .................................................................................................45

PROGRAM FILES ........................................................................................................47


1. Introduction.......................................................................................................................................47
2. The steps to produce an executable...................................................................................................48
3. Types, storage class and scope..........................................................................................................48
4. Local duration ...................................................................................................................................49
5. Declaration versus definition ............................................................................................................50
6. Static duration ...................................................................................................................................51
7. Storage class static ............................................................................................................................52
8. Static local variables .........................................................................................................................52
9. Static global variables .......................................................................................................................52
10. The C++ pre-processor .....................................................................................................................53
11. Conditional compilation....................................................................................................................53
12. Conditional file inclusion..................................................................................................................54

DATA STRUCTURES ...................................................................................................55


1. Data Types ........................................................................................................................................55
2. Abstract Data Types..........................................................................................................................55
3. Classification ....................................................................................................................................55
4. Categories of Collection ...................................................................................................................56
5. Stacks................................................................................................................................................56
6. Abstract Data Type? .........................................................................................................................59
7. Queues ..............................................................................................................................................59
8. Lists...................................................................................................................................................61
9. Structs ...............................................................................................................................................61
10. Unions...............................................................................................................................................62

DYNAMIC DATA STRUCTURES..................................................................................63


1. Structures ..........................................................................................................................................63
2. Comparison between structs and arrays ............................................................................................64
3. Storage Management ........................................................................................................................65
4. Dynamic Data Structures - Linked Lists ...........................................................................................68
5. Other dynamic structures ..................................................................................................................72

SORTING......................................................................................................................73
1. Introduction.......................................................................................................................................73
2. Components of Sorting .....................................................................................................................73
3. Sorting Files......................................................................................................................................73
4. Why sort?..........................................................................................................................................75
5. Does it pay to sort? ...........................................................................................................................75
6. What is the best sort? ........................................................................................................................75
7. Sorting efficiency..............................................................................................................................75
8. Simple Array Sort - Exchange (Bubble) ...........................................................................................76
9. Insertion Sort.....................................................................................................................................77
10. Simple Sort performance ..................................................................................................................78
11. Conclusions.......................................................................................................................................78
12. Complex sorts ...................................................................................................................................78

ii
Table of Contents
13. QuickSort.......................................................................................................................................... 79
14. Efficiency of Quicksort .................................................................................................................... 80
15. C++ code for function Quicksort ( see Wirth )................................................................................. 81
16. Comparison of complex sorting algorithms...................................................................................... 81
17. Further Reading ................................................................................................................................ 81

TESTING ......................................................................................................................83
1. The context for testing - Verification and Validation....................................................................... 83
2. The objectives of testing................................................................................................................... 83
3. Testing & Debugging ....................................................................................................................... 84
4. Two different testing strategies ........................................................................................................ 84
5. Categories of Testing........................................................................................................................ 86
6. Test Planning.................................................................................................................................... 86
7. How much testing? ........................................................................................................................... 87
8. Test Data v Test Cases ..................................................................................................................... 87
9. Black box v White box testing ......................................................................................................... 87
10. Black box testing .............................................................................................................................. 88
11. White box testing - Introduction....................................................................................................... 91
12. White box testing.............................................................................................................................. 92
13. Automated Testing ........................................................................................................................... 96

DATA STRUCTURE METRICS ....................................................................................99


1. Representing Abstract Structure ....................................................................................................... 99
2. Implementing Data Structures ........................................................................................................ 100
3. Metrics............................................................................................................................................ 100
4. Mathematical Notations.................................................................................................................. 101

TREES ........................................................................................................................ 105


1. Applications.................................................................................................................................... 105
2. Implementation............................................................................................................................... 105
3. Variations ....................................................................................................................................... 105
4. Example Declaration ...................................................................................................................... 105
5. Expression Trees ............................................................................................................................ 106
6. Tree Traversal................................................................................................................................. 106
7. Parse Trees ..................................................................................................................................... 107
8. Binary Search Trees ....................................................................................................................... 107
9. Importance of Balance.................................................................................................................... 108
10. Other types of tree .......................................................................................................................... 108

HASH TABLES........................................................................................................... 111


1. Applications.................................................................................................................................... 111
2. Operations ...................................................................................................................................... 111
3. Efficiency ....................................................................................................................................... 111
4. Problem .......................................................................................................................................... 111
5. Hashing........................................................................................................................................... 111
6. Collision Resolution ....................................................................................................................... 112
7. Hash Table example ....................................................................................................................... 112
8. Perfect Hashing Functions.............................................................................................................. 113

LIBRARIES................................................................................................................. 115
1. The ctype library............................................................................................................................. 115
2. The maths library............................................................................................................................ 116
3. The standard library........................................................................................................................ 117

BIBLIOGRAPHY......................................................................................................... 119

iii
Basic C++

Basic C++
1. A First C++ Program
// first.cpp
// My first C++ program
// A. Student
// 27/09/99

#include<iostream>
int main( void )
{
cout << “Hello World” << endl;
return 0;
}
The lines starting with // are comments. These are for human consumption - the compiler
ignores them. They cause all text on the current line to the right of the symbol to be a
comment. An alternative form of comment is the pair:-
/* this is a comment */
These do not need repeating on every line and therefore a number of lines can be enclosed
within one pair.
Since the program is going to display output, it is necessary to make available the
input/output library iostream. This is done by issuing a compiler directive that the text of
the file iostream.h should be included in the compilation. The compiler knows where to
find this file. The word cout represents the output stream and the symbol << causes what
follows it to be placed on the standard output stream. By default, the standard output
stream is displayed on the terminal.
Every C++ program must have one, and only one function main. This is where program
execution always commences. This, and all other functions have a return type, in this case
int, and an argument list, in this case empty - indicated by void and a body that is delimited
by open and close braces { }.
The first line of function main outputs the message “Hello World” to the terminal followed
by a new line. The program then terminates, returning the value 0 to the operating system.
By convention, a return value of 0 indicates success. This program deals only with two
values - a constant string1 literal containing the words “Hello World” and an integer
constant 0. It does not require the use of any variables. Most programs require the use of
variables, i.e. storage locations in memory that contain values during program execution.
Variables may be of different types.

2. Data Types
There are a number of basic data types built in to all programming languages. A data type
consists of a name and a specification of :-
! the range of values that a variable of that type can hold - its domain. This range is
often limited due to the amount of storage that is used by such items.
! the operations that may be carried out on values of that type
In C++, the most common data type is int - whole numbers that may be positive or
negative - natural numbers. The amount of storage allocated to variables of type int is

1
A sequence of characters

1
Basic C++
often 2 bytes and sometimes 4 bytes depending on the compiler. This allows a range of
values from
! -32768 - 32767 in the case of 2 bytes and
! -2,147,483,648 - 2,147,483,647 where 4 bytes are employed.
These peculiar ranges arise from use of the binary system.
The fundamental native2 data types and their storage size in GNU C++ are:-
type Range of values Bytes
Char Character codes 0 - 127 1
unsigned char Unsigned character codes 0 - 255 1
short int Signed integer -32768 to 32767 2
Int Signed integer -2,147,483,648 to 2,147,483,647 4
unsigned int Unsigned integer 0 - 4,294,967,295 4
long int Signed integer -2,147,483,648 to 2,147,483,647 4
Float 1.17549e-38 to 3.40282e+38 4
Double 2.22507e-308 to 1.79769e+308 8

Note that, unlike some compilers, GNU C++ uses 4 bytes for type int thus providing the
same range of values as type long int (or just long). Unsigned integers have double the
capacity of signed integers because there is no need to store the sign.
Strings and characters are not the same. A string containing only a single character, e.g.
"W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A
character variable can hold only one single character, e.g. 'W', normally occupying only
one byte.
To declare a variable of type string and give it a value immediately:-
char myname[] = "Terry Chapman";
If the string is not intended to be changed, it should be declared as a constant:-
const char myname[] = "Terry Chapman";
The empty brackets signify an array whose size is determined automatically by the
compiler which also reserves space for the terminating ASCII NUL. The variable or
constant can be output in the usual way, i.e.
cout << myname;

3. String Constants
A string constant is a sequence of characters enclosed in double quotes. e.g. "MSc
Information Technology". The sequence may be empty e.g. "".
If the string is to include certain characters, e.g. double quotes and the backslash, then
these must be escaped with the '\' backslash character, e.g.
"She said \"I have lost my file mydir\\myprog.cpp\"". When output, this would display:
She said "I have lost my file mydir\myprog.cpp"

2
Types built into the language

2
Basic C++
Other special characters may be included, e.g.
\n newline \? question mark
\t Tab \' single quote
\f formfeed \a alarm bell

A string constant can extend over 2 or more lines by placing a backslash at the end of an
uncompleted line.
Two adjacent strings are concatenated to form a single string e.g.
"This string " "is concatenated with this one"
There is no native data type string in C++. Instead, strings are implemented as
an array3 of characters terminated by the special character '\0' (ASCII NUL). 0 1 2 3 4 5
ie the unprintable character which has the ASCII code 0. We will cover arrays H e l l o \0
later - they are a very important compound data type holding a sequence of
data items in a contiguous area of memory.
Strings and characters are not the same. A string containing only a single character, e.g.
"W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A
character variable can hold only one single character, e.g. 'W', normally occupying only
one byte.
To declare a variable of type string and give it a value immediately:-
char myname[] = "Terry Chapman";
If the string is not intended to be changed, it should be declared as a constant:-
const char myname[] = "Terry Chapman";
The empty brackets signify an array whose size is determined automatically by the
compiler which also reserves space for the terminating ASCII NUL. The variable or
constant can be output in the usual way, i.e.
cout << myname;

4. Variables and Constants


Variables are names associated with a value. In programming, names are referred to as
identifiers. During program execution, the value associated with an identifier may be
changed many times. In C++ the compiler must know the type of the identifier because
this determines the amount of storage that must be allocated for its value. For this reason,
every variable declaration must have a type. In addition, the variable may be initialised
with a value:-
Examples:-
int sum; // variable named sum of type int
int size = 37; // initialised on declaration
int sum, total = 0; // 2 integers, only total is initialised to 0
float average = 0.0; // Initialisation must be of the appropriate type
char ch; // Uninitialised declaration
char ch = ‘ ‘; // literal space surrounded by single quotes
char progname[] = “myprog.cpp”; // strings by double quotes

3
A contiguous sequence of memory locations

3
Basic C++

Identifiers must start with a letter. After this, they may contain any number of letters,
digits or the underscore character. They must not include spaces.
int this_is_a_very_long_identifier_with_99 = 99; // valid
float The Average; // invalid - contains a space
char 2good; // invalid, starts with digit
You must use meaningful identifiers. They are part of the program’s documentation and
should be expressive of the purpose for which the identifier is required. An exception to
this is loop control variables that have no other purpose than to access elements of an
array. These are commonly a single character e.g. i, j.
Constants are named items that cannot change. These are used for values in the program
that will remain constant throughout the program’s execution. They must be initialised
with a value.
Examples:-
const double pi = 3.14159265359;
const int numitems = 350;

5. Arithmetic Operators
+ unary plus or addition
- unary minus or subtraction
* multiplication
/ division
% modulus
Note that there is no exponentiation operator that raises a number to a power. There are
library routines that accomplish this.
The above operators apply to all numeric types (except %). Modulus produces the
remainder after integer division and applies only to integral types-
5 % 2 = 1, 11 % 3 = 2, 19 % 5 = 4.
You should find a table of operator precedence in your textbook. 2 + 3 * 4 means "add 2
to the product of 3 and 4". If you want it to mean "add 2 to 3 and then multiply by 4" you
must change the precedence with parentheses (2 + 3) * 4.
A combination of arithmetic operators and arithmetic constants or variables is known as an
arithmetic expression. An expression has a value, thus 10 * 3 has the value 30.
A statement on the other hand is a command to carry out processing, e.g.
x = 10 * 3; is a statement that means assign to the variable x the value of the expression
10 * 3.
You might have rationalised the difference between a statement and an expression by
thinking to yourself that an expression has a value whereas a statement does not. You
would be correct if you were talking about most conventional programming languages like
Pascal, Modula-2 and BASIC. But you would be wrong if you were talking about C and
C++ since, in these languages, a statement also has a value - in the above example, the
statement x = 10 * 3 has the value (30). This value can be used for further operations, e.g.
for assignment to another variable:-
y = x = 10 * 3; // both x and y now have the value 30

4
Basic C++

6. Type conversions
There are two aspects:-
! Automatic conversions carried out by the compiler
These are discussed below (para 7)
! Type conversion operators
These use the name of a type as a function in order to force an expression into a
particular type e.g. int(99.21) will yield 99.

7. Assignment operator
C++ carries out automatic type conversion so that the result of an expression on the right
hand side of the assignment symbol is automatically converted (if possible) into the type
of the variable on the left hand side. This is convenient in many ways, but there are
occasions when you need to know what the exact effect is. Like letting a futuristic washing
machine automatically decide what program to use according to the clothes you put in.
What program does the machine decide to use when you wash a silk shirt and a very dirty
towel? Do you get a grubby towel or a ruined silk shirt? Ultimately you will need to know
what the conversion rules are, but do not worry about them at present. In any case it is
desirable not to make a habit of mixing your washing since you may get a result you did
not expect.
Briefly, fractional values (types float and double) are truncated when assigned to integral
variables (int, unsigned int, long int). Large values that exceed the capacity of the integral
variable to which they are assigned will cause overflow and the result will be meaningless.
No overflow warning is issued and care should be taken when writing expressions with
integral value to ensure that overflow does not occur.

8. The compound assignment operators


These are all very intuitive and make life easier by reducing the typing.
count += 2 increment the value of count by 2
stock -= 1 decrement the value of stock by 1
divisor /= 10.0 assign to the float divisor the result of dividing its current value by 10
power *= 9 assign to power the result of multiplying its previous value by 9.
remainder %= 2 assign to remainder the result of taking its current value modulus 2

Note that the last may only be used with integers, all others may be used with any
arithmetic type. Note also the effect of sum /= 3 + 7. The expression 3 + 7 is evaluated
first.

9. The increment & decrement operators


y = x++ assign the old value of x to y and then increment x (postincrement)
y = ++x increment x and assign the new value to y (preincrement)

similarly with --
We will return to these when we look at processing arrays using loops.

5
Basic C++

10. Iostream library


Input and output in C++ is based on streams. A stream is an abstract concept that you do
not need to worry about. Just think of the natural phenomenon. Whenever a C++ program
executes, three streams are opened automatically - standard input, standard output and
standard error. Normally, standard input is expected to come from the keyboard and
standard output is sent to the display. However standard input and standard output can be
redirected from the DOS command line using the < and > characters when the program is
executed. Standard error cannot be redirected.

11. Command line redirection


If you wish to capture the output of a program in a file, simply redirect its output as
follows
myprog > myprog.out
Similarly you can substitute a file to be the input to a program instead of the keyboard.
myprog < myinput
To redirect both input and output use something like
myprog < myinput > myprog.out

12. Streams
Access to istream (input stream) and ostream (output stream) operators is obtained by
putting the preprocessor directive #include<iostream> at the top of each program file that
needs to carry out standard input and/or standard output. This has the effect of including
the header file iostream.h (a text file) in the compilation.

12.1 Unformatted input and output


cin and cout are the predefined standard input and output streams defined in the
above header file (there is a third cerr):-
cin >> x >> y >> z; obtains from the keyboard values for 3 variables.
Spaces or tabs may separate the actual inputs.
cout << "A message : " << message << endl;
where message is a string constant or variable.

<< and >> are known as the insertion and extraction operators. The
unusual notation arises from the object-oriented aspects of the
language. Just take it for granted at present
endl causes subsequent output to be displayed on the next line of the
display.
cin.get(ch) gets a single character from standard input and returns the state
of the standard input stream
cout.put(ch) puts a single character to standard output and returns the state of
the standard output stream
cout.good() Return true if there has been no error from the last output (input)
cin.good() operation
cout.bad()
The opposite of good()
cin.bad()
cin.eof() Returns true if end of input encountered, false otherwise. When
entering from the keyboard, end of input is indicated with Ctrl Z.
All of the fundamental types supported by C++ (including strings) may be input
using cin >> and output using cout <<.

6
Basic C++

13. Output manipulators


As their name implies, these allow formatting of the output stream for such things as the
field width, justification, decimal precision etc. They are normally included within the
output statement - see examples below and Skansholm pp 365-369. Use of these
manipulators requires that the header file iomanip be included in the program:-
#include<iomanip>
setw(int) sets the field width to n characters for the output e.g.
cout << "22 right adjusted in field width of 4 is [" << setw(4) << 22 <<
"]";
produces
22 right adjusted in field width of 4 is [ 22]
setw must be repeated for each subsequent output for which a
fieldwidth is required. In the absence of setw() the fieldwidth is the
actual width of the output.
setfill( char ) specifies the character that is to be used for padding output that is
narrower than the field width, e.g.
cout << ‘[‘ << setw(4) << setfill(‘*’) << 22 << ‘]’;
produces [**22]
setprecision(int) changes the precision for the display of types float and double (the
default is 6 digits). Normally it determines the number of digits
displayed, but if the showpoint flag (see below) has been set, then it
controls the number of decimal places displayed
setiosflags( … ) change flags that control such things as justification, precision etc.
and setf()
setiosflags( ios::showpoint ) forces the decimal point to be displayed
even for whole numbers. After the showpoint flag has been set, the
effect of setprecision is to control the number of decimal places
displayed.
setiosflags( ios::left ) and setiosflags( ios::right ) determine the
justification of the output which will remain unchanged until the flag is
modified by another call.
setf() is a member function of iostream and does the same job as
setiosflags except that it cannot be used within an output statement
as setiosflags can. It would be called by e.g. cout.setf(ios::right);

The items starting with ios:: within the parentheses after setiosflags are constants that are
defined in the iostream library. Their names are self-explanatory and you do not need to
know their values. The meaning of ios:: will only be explained in a subsequent module
unless you read up on it yourself.
A program basiccpp.cpp is provided in the lab that shows the effect of setw(n) and some
of the flags that can be set using setiosflags(), including display of integer in octal and
hexadecimal.

7
Basic C++

14. Relational operators and expressions


14.1 Relational operators
< less than
> greater than
<= less than or equal
>= greater than or equal
== equal (Note: 2 equal signs with no space between)
!= not equal

14.2 Relational expressions


These expressions compare two values and return true if the test succeeds and false
otherwise.
ch > 'A'
y * y <= 2 * y + 1
f < 0.0
y == x
ch != '\0'

Take care not to use = as the equality operator. This is a common programming
error.
Beware of testing two floating point variables for equality. Their binary internal
representation means that many fractional values cannot be expressed exactly.
Instead, test for the difference between their absolute values. The function
fabs(<float>) can be used to find the absolute value of a float or double. To use it
you need to #include<math.h>
double f1 = 12.34574, f2 = 12.34578;
const double delta = 0.00005;
if ( fabs( f1 - f2 ) < delta )
… // consider them equal
else
… // consider them unequal
See also Skansholm p52.

15. FALSE and TRUE


Recent C++ compilers support a bool data type that can take one of two possible values -
true or false. Earlier compilers do not support this type and, instead, false is represented by
an integer with the value 0 and true by any non-zero value. GNU C++ provides the bool
data type and we shall be using it on this course. If you are using a compiler at home that
does not support bool, there is a simple addition that you can make to your programs that
gets around this deficiency. Enter the following into a file called bool.h and #include this
in all programs if you are not using GNU C++:-
typedef int bool;
const bool false = 0;
const bool true = !false;
However, I strongly recommend that you do use the GNU compiler. Several students have
had problems when using the Borland 4.5 compiler.

8
Basic C++

16. Logical operators and expressions


16.1 Logical operators.
The draft C++ ANSI standard introduced the new operators AND, OR and NOT.
These are not supported by the GNU C++ compiler, nor by Borland 4.5. Instead use
&&, || and !

&& or AND logical AND


||, or OR logical OR
!, or NOT unary negation

16.2 Logical expressions

!5 false
!0 true
ch = 'a'; assign to ch the letter 'a', i.e.NOT the test for equality
ch == '\0' false
!(ch == '\0') true
true (the character 'a' is converted to an integer and is
(ch)
tested for non-zero)
(!ch) false
(ch && ch != '\n') true
(ch == 'a' || ch == 'A') true

17. Short-circuit evaluation


Note that, in the logical expression expression1 && expression2 both expressions must be
true for the whole expression to be true. If expression1 yields false, then the whole
expression cannot possibly be true. Therefore expression2 does not need to be and will not
be evaluated.
Similarly with expression1 || expression2, if the expression1 yields true, then the whole
expression must be true, whatever the value of expression2, so expression2 is not
evaluated.
This feature is important in cases where, if the first test fails, the second test must not be
evaluated because it would cause an error. We will meet this again when we come to look
at pointers.

9
Basic C++

18. The while statement


This is one of several iteration constructs provided by C++, and is
the simplest.
set up
while ( logical expression == true ) condition

<statement>

The parentheses () are required. If there is more than one while


condition
statement to be executed within the loop then braces { } are
required:- true
while (logical expression == true)
{ statement(s)
statement1;
false
statement2;
etc.. set up
} condition

Example
// show.cpp
// copies its input to its output
#include<iostream.h> next
statement
int main(void)
{
char ch;
cin.get(ch); // get a character from the keyboard
while ( cin.good() ) // Becomes false if end of file or other input problem
{
cout.put(ch); // output the character to the display
cin.get(ch); // get the next char in preparation for the next loop iteration
}
return(0);
}

The while statement is preceded by a statement cin.get(ch) that sets up the value to be
tested by while. This is important because the termination condition may already exist in
which case the loop should not be entered. If the loop is entered, then cin.get(ch) is
repeated at the bottom of the loop to set up the condition again. This is invariably the way
that files are processed since they may be empty. It is a common error to forget to initialise
the test condition before entering the while loop.
This program can be used to display the contents of a text file if issued at the DOS
command line using redirection:-
show < show.cpp displays the source program file show.cpp at the terminal
The output can also be redirected, giving a file copy
show < show.cpp > showcpy.txt

showcpy.txt is now an exact copy of show.cpp


Here is a refinement of the above program:-

10
Basic C++
// show2.cpp
// copies its input to its output
#include<iostream>
int main(void)
{
char ch;
while ( cin.get(ch) ) // Becomes false if end of file or other input problem
cout.put(ch); // output the character to the display
return(0);
}
The get( ch ) function is called within the loop condition parentheses. The expression
cin.get(ch) does two things: a) it gets a character from standard input and passes it back
via its argument ch and b) it returns a reference to the standard input stream cin as its
function result. The stream has the value 0 when there is no further input and this is the
condition being tested by while. This does away with the need for the get prior to entry of
the loop, and also with the get at the bottom of the loop.

19. The if statement


This statement is classified as a branching construct. It allows the flow of control of the
program to be changed depending on the value tested, e.g. an input from the user or a
value held in a file.
if (condition)
statement; if
condition
Or
true (non-
if (condition) zero)
statement1; false (0)
statement(s)
else
statement2;

As with while, if there is more than one statement in either the if


part or the else part then they must be surrounded by braces {} as
in the body of function main. Notice that, unlike Pascal, there must these may also
statement(s) be if statements
be a semi-colon after each statement.
Condition is any logical expression yielding a boolean value (true
or false). It may consist of expressions combined into a larger,
more complex expression by the logical operators && (logical
next
AND) and || (logical OR). Statements within each part ( if or else statement
) may be any statement, including another if statement.
if (condition1)
if (condition2)
statement2a;
else
statement2b;
else
statement1;
The else clause is assumed to relate to the immediately preceding if unless braces are used
to change this association.

11
Basic C++

20. Style for logical expressions


In natural language we can say “If late for lecture then hurry else have another coffee”. We
do not say “If late for lecture is true then … “.
Similarly, in programming, the test of a logical value e.g. in an if statement would be
written as
if( late_for_lecture )
hurry();
else
have_another_coffee();
and not
if( late_for_lecture == true )
hurry();

It is generally considered to be poor programming style to use this second approach and
you will lose marks if you use it.

21. The ctype library


This is a 'C' library of functions that operate on characters. They include functions to test
whether a char is a letter, a digit, punctuation etc. and also to carry out case conversion.
See Libraries on page 115.

12
Functions

Functions
1. Introduction
You have already seen and used a function - the function main which every C++ program
must have. Until now it has been reasonable to write all of the code of your programs in
this function. However, as programs become larger, it is necessary to break them down
into collections of smaller and more manageable units. One such subdivision is the
function. Functions give us the ability to store a computation in a named block of code and
to carry out the computation simply by referring to its name i.e. by calling the function.
This facility for breaking programs down into simpler and more manageable units is a
major weapon in the fight to reduce the complexity of large programs and involves the
process of abstraction. Abstraction allows us to concentrate on the current task and to
ignore details that are not relevant. So when we call a function e.g. sqrt to find the square
root of a number, we are concerned only with how to make the call and not what steps the
function takes to achieve the computation. We do need to know the data type of the
number to be passed to sqrt, the data type of the value returned by it and what happens if
we pass a negative value etc. - these aspects are relevant to our making the call, but the
actual details of the computation are not relevant.
Of course, at different times we will have different levels and views of abstraction - if we
had been concerned with writing function sqrt then we would have been concentrating our
attention on expressing the algorithm to compute the square root of a number and would
have ignored unnecessary detail elsewhere (e.g. the other functions which make up the
library maths). A further advantage of storing code in functions is, of course, the ability to
re-use them again in other programs.
This type of abstraction is called procedural abstraction after the procedures - the name
that most other languages use to refer to these named blocks of code. Technically a
function differs from a procedure in that it returns a value, whereas a procedure does not.
C++ does not have procedures, but it is possible to specify that a function does not return a
value. Functions in C, C++ and most other languages (except the functional languages) do
not conform closely to the mathematical concept of a function that accepts a single
argument and returns a single value. As we shall see, it is possible to pass more than one
value to a function and to get back more than one result.
The structure of a function is:-
type-specifier function_name(argument_list)
{
definition_and_statement_list
}
type_specifier The data type of the value that is returned by the function

function_name A programmer-defined identifier that conforms to the rules for


identifiers. This is the name that is used to call the function.
formal argument_list The names and types of the values that are passed to the
function on which it is to carry out some computation.
definition_and_statement_list Exactly what you have been writing in function main up
until now, i.e.. constant and variable definitions and statements
including (normally) a return statement that provides the value
returned back to the point of the call, e.g. the return(0) appearing at
the bottom of main.

13
Functions

Example:-
You are writing a program which needs to compute values raised to a power. There is no
exponentiation operator in C++, so you must develop one yourself. You want to be able to
write e.g.:-

result = power(12,3)

where result is a integral variable which is to be given the value 12 raised to the power 3
(i.e. 1728). On other occasions in the same program different numbers are to be raised to
different powers, e.g. in the statement
cout << power(7,5) << endl;
outputs 7 raised to the power 5, i.e.16,807.

So the function must be generalised to handle a range of different inputs for its single
result. This generalisation is provided by the argument list. In the call to the function, the
values passed to the function are known as the actual arguments i.e. 12 and 3 in the first
example above, and 7 and 5 in the second example. In the definition of the function they
are known as the formal arguments. It is important that you understand this distinction
because these two terms are used frequently when talking about functions.
Assuming that we want to be able to handle some large resulting values, the integral return
type should probably be of type long int. The type of the arguments can be left as plain
integer. The formal specification of function power is then:-
long int power(int a, int b) // long power (without the int) is also OK
{
definition_and_statement_list
return (<long_integer_expression>)
}
Where <long_integer_expression> is an expression of the result of raising a to the power
b. When the call power(12,3) is made the actual argument values 12 and 3 are copied into
their respective formal argument variables a and b. If the actual arguments had been
integer variables (as opposed to constants) with the same values (12 and 3), then the
values of the actual argument variables would have been copied into the formal argument
variables producing exactly the same effect.
In the function, the formal arguments a and b are effectively local variables of the
function. Any variable definitions made in the body of the function are also local
variables. This means that they are not accessible from outside the function. In fact,
normally, they only exist while the function is executing and are then removed from
memory. Inside the body of power there will be an appropriate computation that produces
a value representing a raised to the power b, and this value will be passed back by the
return statement. A function normally has a value (unless its return type is void) and can
therefore be used on the right hand side of an assignment or within a cout statement in just
the same way as a variable or an arithmetic expression. In fact a call to a function which
returns a value is an expression.
In the case of the statement:- result = power(12,3), the returned value will therefore be
assigned to result. The value returned by power can be used anywhere else that an
expression of long int type is required, e.g. in
cout << power(7,5) // 16,807
or even as the actual argument of a call to another function.
cout << power( power( 2, 3 ), 4 ) // 4,096

14
Functions

2. Input and output in functions


In general, it is considered good practice to isolate input and output statements in one
particular area of a program. This is because I/O tends to be hardware-specific and it is
easier to make changes for a different machine platform or display device if all the I/O
code is in one place. When writing small programs in a learning situation, it is not always
easy to follow this guide for best practice. But, wherever possible, try to confine I/O to one
or more suitable functions rather than spreading it across the program in a number of
functions whose primary purpose is not I/O.
In particular, it is not good practice to carry out I/O in low level functions. The reason for
this is that a function that may be re-used many times in many different programs cannot
know how the calling program wishes its output to be displayed, whereas the calling
program does know this. Different operating environments have different ways of
displaying output to the user of the program, so a low level routine that displays output for
a character console could not be used in a program that runs in a windowing environment.

3. Multi-function programs
There must always be a function called main in any C++ program. There may be any
number of other functions in the same source program file (or indeed in other source
program files). The question then arises - where do you put these other functions? C++
does not allow functions to be nested within other functions (unlike Pascal and Modula-2).
So additional functions may appear textually either before function main, or after it. When
the compiler scans the source text of a program, it will flag an error if it finds a call to a
function whose definition it has not yet encountered. So if a function is defined after main,
then a function declaration must appear before the point at which the call is made. This
declaration (also known as a function prototype) should normally be placed at the start of
main giving the compiler sufficient information to enable it to check that the function has
been called properly. This prototype will consist only of the return type, the function
name, and the types of its arguments.
// fun01.cpp
// illustrates the placing of functions in relation to main
// tdc 28/09/95
#include<iostream>
int add(int a, int b ) // this placing is deprecated
{
return(a + b);
}
int main(void)
{
// int mult(int a, int b); // prototype commented out
int x = 10, y = 3;
cout << add( x, y ) << endl << mult(x,y) << endl
Error: Function 'mult' should have a prototype in function main()
return(0);
}
int mult(int a, int b)
{
return (a * b);
}
Function add has been placed before main contrary to the recommendation for best
practice above.
15
Functions
The prototype for function mult has been commented out, causing the compiler error.
Removing the comments allows the program to compile successfully.
Different organisations may set their own 'house' styles, but we will show the full
definition of functions after main with prototypes normally appearing as the first
definitions within the body of main.
Note that the identifiers a and b in the prototype for mult are not essential. The prototype
could have been
int mult(int, int); // prototype with argument identifiers omitted

But the argument identifiers may be included if they aid the understanding of their
purpose. The compiler will also flag an error if the prototype does not match the formal
definition as regards either its name, or its number and type of arguments. But it will not
detect a difference between the return type as declared in the prototype and as defined in
the formal definition. If there is such a difference then a run-time error is likely to result.

4. Stepwise Refinement (or Top-down design)


When designing the solution to a problem it is normal to set out in logical order the steps
that need to be taken.
Stepwise refinement is a technique for tackling the problem of program complexity by
breaking a task down into steps, each of which is implemented by a function. Each of
these functions are then further Function main()

refined by breaking each of Step 1


them down into a series of Step 2

steps implemented as Step 3


Step 4
functions, and so on.
Initially, the design process
can be approached by using a
PDL (Program Description
Step 1.1 Step 3.1
Language). We do not Step 1.2
Step 2.1 Step 3.2 Code
Step 3.3
introduce a formal description
of such a language, it is better
left flexible so that you can
use a structured type of natural
language. Read Skansholm pp Code Step 1.2.1 Code Code Code Code

20-26 for an example of Top-


down design. When you find
that it is impossible to specify
any further steps in the process Code
of functional decomposition
without using commands of
the programming language, then you have taken the functional decomposition process as
far as it can go. You are then ready to translate your natural language description into C++
source code.
Initially, your programs will be short and simple and you will wonder what all the fuss
was about. But when you have to tackle a large problem you will, I hope, see the point.
Initially, you will be unfamiliar with the syntax of C++, so it will be extremely difficult for
you to express a solution to a difficult problem directly in the programming language. In
these circumstances, it is essential that you develop the habit of expressing a solution
in natural language before attempting to write the code.
Note that, in the schematic diagram above, those boxes (functions) which consist entirely
of Step x.x should not be assumed to consist entirely of function calls without any other

16
Functions
code. They may well contain constructs such as branching (if, else) and loops (while etc.)
within which the subsidiary functions are called.

5. Automatic variables
Variables declared within a function are called local variables and have the default storage
class automatic (auto is the key word). Since this is the default, the storage class does not
have to be given and it is normal to omit it. There are other storage classes that will be
dealt with later.
Scope is an important topic since the scope rules determine the visibility of objects. If an
object is not visible, it cannot be changed. Your Unix password is invisible to others
because, if others had access to your account, you do not know what they could do. They
might let you have useful comments about your work. On the other hand they might
change it, or delete it. The scope mechanism is employed to reduce the chances of errors in
a program caused by some other programmer (or even yourself!) from inadvertently
corrupting the program as a result of changing an object to which he/she should not have
access. This is part of the concept of encapsulation which we shall cover in more detail in
the second Semester. For now, work on the principle that functions should not, as a rule,
use or modify global variables.
As an example, if function x requires a variable to control a loop, declare that variable
locally within the function. In that way, only errors within the function itself can cause the
loop to run incorrectly. If a global variable were used for this purpose, there is a possibility
of it being changed from outside the function while the loop is executing causing errors
that can be very difficult to identify and correct. Similarly, although there can be
exceptions, functions should not modify global variables directly. Instead this should be
done via arguments. More about how in a later lecture.
An obvious corollary to the lack of visibility of a local variable from outside the function
is that variable names may be duplicated within different functions without any clash.

6. Function values
A fairly obvious point - the value appearing after return should be of the same type as that
in the definition. Thus
int add(int a, int b)
should, in its return statement, return an integer value. You have been doing this for some
time in function main.
As mentioned earlier, it is possible for a function to accept no arguments, or to return no
value. In either of these cases, the reserved word void should be used, e.g.
void dosomething(void)
is a function which neither accepts arguments nor returns a value. In this case it must not
have a return statement, and a call to it must be used differently to reflect the fact that no
value is returned.
dosomething(); // i.e. a statement, not an expression
result = dosomething(); // wrong
7. Function arguments
These are a means of passing information to a called function. It is also possible for a
function to pass information back via its arguments and this will be dealt with later.
Arguments are a comma-separated list of type/identifier pairs appearing within the
parentheses after the function name, e.g. (int a, int b) as in function add above. Naturally,
the number and type of the actual arguments supplied in the call must match the number

17
Functions
and type of the formal arguments with the exception of default arguments (see Default
arguments on page 32. The function may modify the values of its arguments, and this will
have no effect on the values of any actual argument variables used in the call.
Remember that the values of the actual arguments are copied into the formal argument
identifiers. This is the pass-by-value argument mechanism. The actual arguments may be
any expression of the correct type. This includes a literal constant, e.g. 9.0, a variable, e.g.
f, or even a call to another function which returns a value of the correct type, e.g.
cout << sqrt( sqrt(81.0) ); // outputs 3.

8. Function argument agreement & conversion


Automatic type conversion is carried out when actual arguments are copied into formal
argument variables in just the same way as that carried out during assignment. Generally
speaking you should not rely on this. Instead always pass the correct type as actual
arguments.

9. Overloaded function names


First you should recognise that an operator (e.g. +) is just a function specified in a different
way i.e. normally in infix form. Thus the arithmetic expression a + b in infix form is just a
different (and more convenient) way of expressing the function add( a, b ) which is in
prefix form. Assuming that add has been declared as:-
int add ( int a, int b );
then both a + b and add( a, b ) are expressions which have the value of the sum of a and b.
In most programming languages, some operators are overloaded, e.g. the '-' operator can
mean
! unary negation e.g. -1
! binary subtraction of integers e.g. 3-2
! binary subtraction of floats e.g. 4.5 - 3.2
! binary subtraction of long int e.g. 123456789L - 123456788L
We are allowed to use the same operator for semantically similar operations because it is
convenient to do so even though the actual computation required is quite different - the
compiler determines which computation to perform based on the type of the operands.
But many languages will not allow the corresponding functions to have the same name,
e.g. subtract( int, int ) - a function accepting two arguments of type int would not be
permitted to exist in the same scope as subtract( float, float ) - a function accepting two
float arguments. This is illogical. Fortunately for us, C++ does permit overloading of
function names provided that they can be distinguished by their signatures i.e. the number
and type of their arguments. You have already seen this with the standard output stream
cout that has a function << that accepts an argument of any one of the fundamental types.
The language allows the function << to be declared in such a way that it can be used as an
operator.
Note that functions with the same name must be distinguishable by their number and type
of arguments. The function return type is not taken into account in determining whether
they are different.
void print( int, int );
void print( float, float ); // OK. different argument types
int print( int, int ); // error erroneous redeclaration, the return type is
// not considered

18
Functions

10. Reference Arguments


This will be dealt with in a subsequent lecture.

11. Function comments


Each function should be provided with one or two lines of comment after the header
describing what it does and any special assumptions that it makes about any arguments
passed to it. The formal way to do this is to provide pre and post conditions which
specify:-
pre assumptions the function makes about the value of arguments passed to it and any
other relevant conditions. There is no need to include assumptions about the types
of the arguments since the compiler will check these.
post the state after it has accomplished its task. This may include any limitations on the
return value, how unusual situations are flagged etc.

These pre and post conditions then form a contract between the caller and the function.
The caller guarantees to meet the pre-conditions and the function guarantees to satisfy the
post-conditions. If the caller fails to meet his side of the contract (i.e. he does not meet the
pre-conditions), then all bets are off, and the function is relieved from meeting the post-
conditions.
Some language designers consider that this concept is so important that it should not be
dealt with merely by comments. They have therefore incorporated pre and post conditions
into the language so that they can be checked at run-time, raising an exception if the
contract is broken. Eiffel is an example.
Large programs have to be broken down into smaller and more manageable components in
order to deal with their complexity and to allow teams of programmers to work on them.
The separate components can be tested individually with a range of inputs to ensure that
they behave as specified. But what happens when they are put back together again? Will
all these components work together? Or will there be discrepancies arising from a
misunderstanding on how the parts interrelate? The ability to check the interaction of
these components at run time can provide significant advantages in terms of quality and
reduction of debugging time.

12. Summary
We have looked at functions which may have formal arguments or should have the word
void in the formal argument list to indicate that no arguments are required. Functions
normally return a value via the return statement, and the type returned must agree with the
return type provided in the definition.
Functions are called by name, passing actual arguments whose values are copied into the
formal arguments. Since a call to a function that returns a value is an expression (i.e. it has
a value), a function call may be used in any case where an expression is expected.
It is recommended that function definitions appear after the function main. This requires
that function prototypes appear as the first lines of function main. Functions whose
prototypes are supplied in main are private to main, i.e. the prototypes serve the
requirements of main and no other functions. If there are other functions, defined after
main, and before the functions they wish to call, then they will not be able to do so. There
are two solutions:-

19
Functions
! Ensure that the definitions of the functions to be called appear before the definitions
of the functions that wish to call them.
! Provide prototype declarations for the called functions before main so that they have
file scope and can therefore be called from anywhere in that file.
Local variables of a function usually have the storage class auto and are not visible to code
outside the function. They cease to exist after the function terminates. The formal
arguments are also invisible from outside. Changes to formal arguments that are passed by
value and changes to local auto variables have no effect outside the function, and their
identifiers may duplicate identifiers appearing elsewhere in the program.
Functions are one of the weapons that C++ provides in the war against complexity and the
errors that this complexity may bring with it. They are an example of procedural
abstraction and allow a program to be designed as a hierarchy of functions that
progressively refine the problem by breaking it down into smaller problems. Large
programs must be designed on paper using this process of stepwise refinement before the
program is written. A suitable tool for this design process is a PDL (program description
language), one variant of this being known as Structured English. Libraries of frequently
used routines (functions) can be written and a very large number of libraries are provided
with all compilers, each library containing a number of functions.
Pre and post conditions provided as comments at the head of the function are an important
way of specifying what they do and how they are to be used. This helps to ensure that,
when a large number of tested functions is finally brought together to form a program, the
various parts work together as specified.
Ideally, input and output should be isolated in a limited number of functions designed for
that purpose and not scattered about over many functions whose primary purpose is not
I/O. Generally speaking, functions should not modify global variables and should never
use global variables for such local uses as loop control.

20
Flow of Control

Flow of Control
1. The type cast operator
The typecast operator provides the possibility of forcing an expression into another data
type by using the name of the new type as though it were a function. For example, in a
program to calculate the statistics on a sequence of integers, the mean can be calculated
from the integer total of the numbers divided by the count of the number of items (where
mean is a float) by:-

mean = float(sum) / float(count);

The new C++ standard has introduced four new operators that carry out explicit
conversions from one type to another. Of these four, only static_cast is introduced. It is
intended to be used for conversion between similar types, e.g. between char and int,
between int and enum, and between float and int. Example:-
mean = static_cast<float>(sum) / static_cast<float>(count);
Explicit type conversions are error-prone and a large proportion of program errors is due
to them (Stroustrup). The virtue of the new operators is that they are easy to search and
find in large program source files, whereas the earlier example float(sum) could be very
difficult to find.

2. The comma operator


A sequence of statements can be separated by commas. The last statement or expression
provides the value of the sequence, e.g.
s = ( t = 2, t + 3 );
t is assigned the value 2, then the expression t + 3 is evaluated to 5, and this last
expression provides the value assigned to s.
This device can be used to include, for instance, a number of statements in a while loop
condition. The value of the last expression is the value of the condition.
while( cin.get(ch), !cin.eof() )
An attempt is made to read a character from standard input and a test is made to see if a
character could not be read because the end of the file has been reached. The value of the
loop continuation condition is that of the test !cin.eof() and, if this has the value false, then
the loop terminates. If the input stream is empty then the loop is never entered.

3. The conditional operator


This consists of 3 expressions separated by a '?' and a ':'
expression1 ? expression2 : expression3
The first must be a logical expression, i.e.. yielding either true or false. If expression1 is
true, then the value of the whole expression is the value of expression2, otherwise the
value is that of expression3. This could have been used in defining function max:-
Example 1
int max( int a, int b )
{
return( a > b ? a : b );
}

21
Flow of Control

Equivalent to:-
Example 2 if ( justify == 'L' )
cout.setf( justify == ‘L’ ? ios::left : ios::right ); cout.setf( ios::left );
else
cout.setf( ios::right );
4. The for statement
In most programming languages, the for iteration construct is suitable mainly for loops
whose number of iterations can be determined in advance. In C++, the for loop is much
more general and can, in fact, be employed for any loop including while and do. The
syntax is

for ( expression1; expression2; expression3 )


statement_block;
where:-
expression1 may consist of one or more statements (separated by commas)
that initialise the loop. e.g. count = 1, max = 10;
A new variable declaration may be made here whose scope will extend
to the end of the for loop block:- e.g.
int count = 1, max = 10;
expression2 is a logical expression which determines the continuation of the
loop (in the same way as in a while loop) e.g. count <= max.
This expression may consist of several logical expressions
connected by the boolean operators && (and) and || (or).
expression3 is a statement or statements which will be executed at the end
of each loop iteration. Normally this is used to modify the loop
control variable, e.g. count++
statement_block is either one statement, or more than one statement surrounded by
braces. The single statement may be empty.
If any of the 3 expressions is missing, the semi-colon separator must remain to show its
absence.
Examples
a) for( ; ; )
cout << "hello" << endl;
runs for ever, printing "hello" on a new line each time
b) for(bool forever = true; forever ; )
cout << "hello" << endl;
behaves as a) because forever is forever true
c) for ( cin.get(ch); !cin.eof() ; cin.get(ch))
cout.put(ch);
this is the same as:-
cin.get( ch ); or
while( !cin.eof()) while( cin.get(ch), !cin.eof() )
{ cout.put(ch);
cout.put(ch);
cin.get(ch);
}

22
Flow of Control

Note that, in example b) above, bool forever in the first expression is the declaration
of a new boolean variable. It is convenient and makes programs easier to read if the
declaration of variables is as close as possible to the point where they are used. This
facility is one of the improvements over the ‘C’ language provided by C++.
Because of its versatility, there is a tendency for programmers to use the for loop
exclusively and to ignore the while loop. However, the latter is designed to deal explicitly
with cases where the loop should not be entered at all under certain conditions (e.g. when
processing a file which may be empty). Although this condition can be handled by for as
shown above, its primary purpose is for loops whose number of iterations can be
determined before it is entered e.g. when processing arrays (to be covered soon). The very
fact that a while loop is being used signals that it may never be entered whereas, in a for
loop this fact can only be determined by inspection of its expressions.

5. The do statement
In a limited number of cases, processing requires that the loop condition is tested at the
bottom rather than at the top of loop. In other words, the statement(s) in the loop body will
always be executed at least once. The format of the do loop is:-

do
statement_block;
while ( expression );
where expression is a logical expression yielding either true or false. As with all loop
statements, if statement_block comprises more than one statement, it must be enclosed in
braces:-
do
{
statement_1;
statement_2 ;
...
} while ( expression ); // Note: the test normally appears on same line as the
// closing brace

6. Nested loops
Frequently, a loop is nested within another loop or loops. The reasons why this might be
necessary will become clearer when arrays are covered. Notice that the total number of
iterations of the inner loop is the product of the number of its iterations and those of any
surrounding loops. This number can escalate to very large values and can result in
programs that run slowly.
for ( int i = 0; i < 10; i++ )
for ( int j = 0; j < 10; j++ )
for ( int k = 0; k < 10; k++ )
process( i, j, k )
Function process is called 1,000 times.
;

Sometimes, it may not be obvious how many potential iterations of the inner statement
will occur because, for instance, the second and third lines above may consist of function
calls that, themselves, contain a loop. You should always be aware of the possibility of
introducing inefficiencies into a program in this way because it may result in unacceptable
performance.

23
Flow of Control

7. The break statement


This statement is used to alter the flow of control in loop statements (for, while and do)
and in the switch statement (see below). Its effect in loops is to cause immediate exit from
the loop in which it appears. This might be required if some abnormal condition occurs
that requires that no further iterations of the loop should be made. The abnormal condition
would normally be detected by an if statement. If the loop is nested within another loop
then control will return to the immediately surrounding loop.

8. The continue statement


This complements the break statement and causes an immediate switch of control to the
test part of the loop in which it appears, thus ignoring any remaining statements that
appear after it in the loop body. Its use is deprecated.

9. The switch statement


Where a choice needs to be made from a number of possible states, the if else statement
can become cumbersome. The switch statement is a more compact and readable
alternative. The syntax is:-
switch ( expression )
{
case constant_expression1 : statement_block;
case constant_expression2 : statement_block;
case constant_expression3 : statement_block;
...
default : statement_block;
}

Where:-

expression is an expression yielding an integral value, i.e. int, short,


long, unsigned or char (but excluding floats and arrays),
e.g. a typical menu selection might include:-
cin.get(ch);
switch( toupper(ch) )

constant_expression1,2,3 .. are some possible values of expression, e.g. 'P', 'D', 'E'.
They must be constants, either literal or symbolic - see
the example below.
statement_block is a statement or sequence of statements which will
normally end with the break statement. The effect of
break is to cause control to jump out of the switch
statement and not to execute any statements in the
following cases. If break is not present, then the
statements in subsequent cases are executed until either a
break statement is encountered, or the end of the switch
statement is met.
default if none of the cases is met, the statements in the default
section are executed. It is wise always to include this so
as to deal with all other possible values of expression.

24
Flow of Control

Example (this program is installed in the lab)


int main(void)
{
void DoPrint( void );
void DoDisplay( void );
void DoEdit( void );
const char EDIT = 'E', DISPLAY = 'D', PRINT = 'P', QUIT = 'Q';
cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush;
for( char ch = '\0'; ch != QUIT; ) // ch initialised to null. While ch == null
{
ch = getch(); // getch from conio.h - char input without echo to the
// display
switch ( toupper( ch )) // toupper from ctype.h
{
case PRINT : DoPrint();
break; // assumed functions DoPrint etc. are
// defined elsewhere
case DISPLAY : DoDisplay(); break;
case EDIT : DoEdit(); break;
case QUIT : break;
default : cout << '\a'
<< endl; ch = '\0'; // invalid response,
// sound the bell
}
cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush;
}
return(0);
}

25
Pointers, References and Functions

Pointers, References and Functions


1. Introduction
A variable is a symbolic name for a location in memory that holds a value of the relevant
type. In the assignment statement:-
x = y;
the meaning of the use of the variable names x and y is different. The use of variable y
means the value currently stored at the memory location known as y. The use of variable x
means the memory location known as x. Thus the whole statement can be understood as
meaning obtain the value stored at the memory location known as y and store it in the
memory location known as x. The current value stored in x is neither needed nor accessed,
and is overwritten by the assignment.

2. Reference Type
We introduce a new data type the reference whose value is not an integer, float, char etc.
but a reference to a variable which holds an integer, float, char etc. It is an alias for
another object. Alias means another name for.
Example:-
k b
Assume the following declarations

int k = 5;
int& b = k; Address Contents Address Contents

and assume that variable k is stored in 46524 5 75145 46524


memory location number 46524.
The value stored at memory location 46524
is 5.
Variable b, a reference variable, is declared to be a reference to variable k. It will therefore
hold as its value the memory location of k, thus referencing the value of k, i.e. 5.
Any assignment of a new value to k will therefore affect the value referenced by b, and
any change to the value referenced by b will change the value of k.
Note that the special symbol & is placed after the type (int) in the declaration of b and that
this must be followed by an initialisation using a previously declared variable (not a value)
of the correct type. Once this declaration and initialisation has been made, b is behaves
exactly as though it were an ordinary variable. The compiler looks after the necessary
indirection so that e.g. the assignment:-
b = 12;
is interpreted as 'assign the value 12 to the memory location referenced by b'. After this
assignment, k also has the value 12, and after the further statement b++, k has the value
13. Note that b++ does not (in contrast to a pointer) increment the value stored in b, i.e.. it
does not change 46524 to 46525.
Once a reference variable has been associated with another variable in this way, it cannot
be changed so that it refers to a different variable. Thus &b = m intended to mean 'change
b so that it is now an alias for m instead of k' is not allowed.

27
Pointers, References and Functions

3. Pointers v References
Pointers are carried over from C, and are, in part, superseded by the reference type.
However, many C libraries use pointers and the type has been retained for compatibility
purposes and for their importance in building dynamic data structures. Some books
describe references in abstract terms, and pointers in concrete terms. Pointers, they say, are
variables which hold the address of another variable. But, in fact, this is exactly what
references hold as their value. The differences are:-
! Abstraction
The fact that references hold an address does not need to be known in order to use
them, whereas you must take specific action in order to make a pointer point to
some other object and to obtain the value of the object pointed to (see Syntax
below).
! Syntax
Pointers require special symbols to be used by the programmer -
! to assign to a pointer the address of another object i.e. to make it point to it - use
the address operator &
! to yield the value of the object to which a pointer points, known as
dereferencing - use the indirection operator *
Reference variables, once declared are treated as ordinary variables without the use
of special symbols. The necessary indirection is looked after by the compiler.
References are at a higher level of abstraction than pointers. A further difference is that
pointers can be reassigned at will to point to another variable and can be incremented to
step through memory. They are a much lower level tool than references as befits their
origin. References cannot be reassigned to point to a different object.

Pointer variables Reference variables


int k = 5; int k = 5;
OK. Declaration of a pointer to int illegal. Must be initialised on
int* ptr int& ref;
named ptr declaration
declaration and initialisation
int* ptr = &k; int& ref = k; declaration and initialisation
combined using address operator &
using indirection operator * to assignment to the variable
*ptr = 12; assign 12 to the variable to which ref = 12; referenced by ref (no special
ptr points syntax needed)
cout << k; prints 12 cout << k prints 12
cout << *ptr; prints 12 using dereferencing cout << ref; prints 12
cout << ptr; prints the address of k e.g. 46524
increments the address held by ptr
increments the variable
which now references the memory
ptr++; ref++; referenced by ref, k now has
location immediately following that
the value 13
of k. (k is unchanged)

28
Pointers, References and Functions

4. Enumeration Types
It is valuable as a documentation tool to use symbolic names for constant values in
programs. The classic case is pi which can be given a symbolic name by
const double pi = 3.14159265359;
If however you need to model a real world object that may take on any one of a set of
know values, then you can declare an enumeration type -
enum dow = { SUN, MON, TUE, WED, THU, FRI, SAT };
dow day1, day2, day3;
Creates an new data type dow (day of week) and declares 3 variables ( day1, day2 and
day3 ) of this type. Note that the enumerated values are not strings. They are simply
constant numeric values that commence with 0.
A further possible use for an enumeration type is to describe the different states that a
program may be in at any one time. In this type of program, the processing e.g. of input
will vary depending on the current state, and certain types of input will have the effect of
changing the state. An example of this type of processing is reading a data file that
consists of several lines, each containing a description and a number. The description may
include numeric digits, so it is enclosed in quotes:-
“3D Drawing Program” 12
“Sprocket Type 4S” 31
The states might be described using an enumeration as follows
enum State = {IN_NAME, IN_NUMBER, BETWEEN};
State state = BETWEEN;

5. The typedef statement


Allows the definition of new data types based on the fundamental types of the language.
The new type is just an alias for the base type and cannot be given any attributes that are
different from the original.
typedef float real; // creates a new type real based on float.
real length = 10.56;
typedef int* intptr; // type intptr is a pointer to int
There may not seem to be a great deal of value in this mechanism until we meet compound
data types, e.g. array and struct.

6. Reference arguments to functions


The reference type is rarely used in the way described in para. 2. It is intended primarily to
be used in function formal arguments.
Earlier, we looked at simple functions and noted that functions in programming languages
are not 'pure' in the mathematical sense in that they can return more than one value. The
classic example of this is the function that swaps the value of its two arguments. This is
frequently used in sorting algorithms.

int a = 6, b = 199;
swap( a, b );
cout << setw(6) << a << setw(6) << b << endl;
199 6

29
Pointers, References and Functions
This function does not need to have a return value, but it must return the changed values of
its two arguments. This is accomplished by using reference arguments-
void swap( int& x, int& y )
{
int temp;
temp = x;
x = y;
y = temp; // classic swap algorithm. Needs a temporary variable
}
So what is happening here?
The actual arguments in the call swap( a, b ) are the variables a and b. The formal
arguments are defined to be references to integer ( int& x, int& y ). When the function is
called, the compiler recognises that the function is expecting references to integers and not
integer values, so it copies into x and y, not the values 6 and 199, but references to the
variables a and b which hold these values. When the swapping is carried out in the body of
the function, the values that are swapped are those of the variables referenced by x and y,
namely a and b. This is because x and y are aliases for a and b, so anything done to x and y
is actually being done to a and b! For this reason, the function may only be called with
variables and not with literal constants, e.g. swap( 6, 199 ); would be an error.
This is the mechanism provided by C++ to allow a function to return values via its
arguments. Not all of the arguments need to be reference arguments. A function to convert
a time in seconds held as a long int (first argument) into hours, minutes and seconds (the
remaining 3 arguments) will have the first argument as a value parameter and the
remaining three as reference parameters.

void time2hms( long t, int& h, int& m, int& s )


Sometimes formal arguments are referred to as IN, OUT and INOUT arguments. In the
case of function swap, both arguments were INOUT, whereas in time2hms, the first
argument is IN, and the next three are OUT. So both OUT and INOUT arguments need to
be defined in the function declaration as reference arguments.
Notice that, in the prototype of any function, the argument identifiers may be omitted as
shown below, but notice that the absence of the identifiers makes it more difficult to
understand what the function does without further comment being provided.
void time2hms( long, int&, int&, int& ); // prototype for time2hms

7. Pointer arguments to functions


Although there is generally no need to use pointer arguments to functions because
reference arguments can do the same job, it is still common to find them - particularly if
they were originally written by 'C' programmers. In addition, many functions in the 'C'
libraries accept and return pointers. A further consideration is that pointers are frequently
used to access successive components of an array rather than by the conventional means
(array indexing - to be covered later).

30
Pointers, References and Functions

Example 1
A typical example of C code:-

void makeupper( char* s )


// converts a string to upper case by using a pointer to access the components of the
// array
{
char *p = s; // local pointer variable p given value of s,
// i.e. p now points to the first character of the string s
while ( *p ) // while char pointed to by p != '\0' - the ASCII NUL string
// terminator
{
if (*p >= 'a' && *p <= 'z') // if the char pointed to by p is lower case
*p += ('A' - 'a'); // convert to upper case
p++; // increment pointer to look at the next char
}
// since p points to the same string as s and s is a
// pointer to the actual array argument, the actual
// argument has been converted to upper case
}

...
char name[] = "i am all in lower case";
makeupper(name);
cout << name << endl;
I AM ALL IN LOWER CASE

Note that, in 'C' and C++ an array passed as an argument to a function is always passed as
a pointer to the first element.
Example 2
The 'C' string library cstring or string.h contains a number of functions operating on 'C'
style strings which accept pointer arguments and some of which return pointer results,
typical ones are:-
char *strcat(char *dest, const char *src); // concatenates 2 strings returning a
// pointer to the result. dest has been
// modified
char *strcpy(char *dest, const char *src); // copies src into dest returning a
// pointer to dest as result
An example of the use of these two functions is:-
char source[25] = "GNU";
char *blank = " ", *cplus = "C++";
char destination[25];
char *p = destination; // p points to the string destination
p = strcat(source, blank); // concatenate a blank onto source. p points to source
strcat(source, cplus); // concatenate "C++" onto source
strcpy(destination, p); // copy the result back into destination. p still points to
// source which has been changed.
cout << "destination = " << destination << endl;
destination = GNU C++

31
Pointers, References and Functions

8. Default arguments
Sometimes we need to provide an argument that enables the caller to change the default
behaviour of the function. Where the default behaviour is not to be overridden, then there
should be no need to provide this argument. C++ permits a default argument value to be
specified in the function declaration and, if this argument is not supplied by the caller, then
the default value is used by the function. If the argument is supplied, then it overrides the
default. In the case of one default, it must be the last. In the case of two defaults, they must
be the last and last but one etc.
The default must be supplied only once - in the declaration (prototype), and should not be
repeated in the function definition.
Assume a function is to print to the stdout a number of lines of a file. The default is 4
lines, but this may be overridden by supplying an argument specifying a different number
of lines.
void printfile( char filename[], int numlines = 4 ); // prototype
void printfile ( char filename[], int numlines ) // definition
{
...
...
}
printfile( "fred.cpp", 10); // overrides default with 10
printfile( "jim.cpp"); // default of 4 is used

9. Inline functions
Calling a function has an overhead that costs time. The runtime system has to set up a
'stack frame' and allocate space for the arguments and local variables. On termination, the
stack frame has to be released and a jump made to the point immediately after the call.
Very small functions can be specified as 'inline' so that the compiler will substitute the
actual code of the function body for each occurrence of a call to the function. This will
improve speed at the expense of code size. In fact, the use of inline is a recommendation
only, and there is no guarantee that the compiler will honour it - this will depend on the
compiler and the size of the inline function.
int main ( void )
{
inline int square( int ); // prototype
...
z = square( x ); // compiler should substitute z = x * x
...
}
int square( int a )
{
return ( a * a );
}
A test of the above program was timed for 100 million calls to function square. The
elapsed time without inlining was approx 3.9 seconds and, with inlining, approx 3.05
seconds - an improvement of 20%. The code size was increased by a very minor amount
because the call to function square occurs only once.

32
Pointers, References and Functions
Note that the GNU compiler does not care whether the keyword inline occurs in the
prototype, in the function definition or in both places. To achieve inlining the compiler
optimisation switch -O has to be set. In RHIDE change the option
Options.Compilers.Optimizations -O to 1

10. Mathematical functions


See Libraries on page 116

33
Arrays

Arrays
1. Introduction
Arrays are an aggregate type capable of holding a number of values all of the same type,
contiguously in memory. The components may be any one of the fundamental data types -
int, long, unsigned, float, char, enumerated, pointer or one of the aggregate types, i.e.
array, struct or class. The struct and class types have not yet been covered. The struct is
referred to in other languages as record and consists of one or more fields of (possibly)
different types (including arrays and records). The class data type will be covered in the
Object-Oriented Programming & Design module.
The advantage of the built-in array type is that a large number of data items can be held in
a single named array variable whose components can be accessed randomly as we shall
see later. The disadvantage is that its size is fixed at compile time and this cannot be varied
at run time to accommodate the fluctuating requirements of the application. Most of the
time, therefore, it is wasting space because it is not full and the type itself does not allow
resizing. The solution, as we shall see later, is dynamic memory allocation.

2. Defining and referencing arrays


The syntax for the definition of an array is
type_specifier name[number_of_elements]
where

• type_specifier is the data type of the components.


• name is an identifier conforming to the normal requirements for
identifiers.
• number_of_elements is the total number of components that the array is to be
capable of holding. This value appears in square brackets
and may be a literal e.g. 6, or a previously defined
constant e.g. numelements where numelements has been
defined as const int numelements = 6;

Example

0 1 2 3 4 5
9 14 7 5 1 3

An array of integer with 6 elements

int table[6]; an array called table capable of holding 6 integers


float temperatures[31]; an array called temperatures capable of holding 31
floats
char name[16]; an array called name capable of holding 16 characters
(but note that, allowing for the terminating NUL
character, only 15 readable characters can be held).
Arrays are indexed. That is, each element is uniquely numbered. The numbering always
starts at 0 and always increments by 1 for each successive element (regardless of the size
of the elements).

35
Arrays
The value held by table element 0 is 9, the value held by table element 1 is 14 etc. Access
to the elements (or components) is by subscripting the table name with the desired element
number. Thus table[0] is an integer with the value 9, table[1] contains 14 etc. Notice that,
since the numbering starts at 0, the last element always has an index one less than the
number of elements. The subscripted array can be used anywhere that an expression of the
component type is required:-
const int size = 6;
int table[ size ]; Change the value of element 1 to
that of element 5
table[ 5 ] = 22;
table[ 1 ] = table[ 5 ];
cout << table[1];
output the integer (22) contained in
element 1
The subscript may be any expression with an integer value, thus:-

int i = 3;
table[ i ] = table[ size - 1 ]; change the value of element 3 to
that of element 5 (the last)

Since the array subscript can be a variable, we can process an array's elements by means of
a loop using as subscript a variable that increments for each iteration of the loop:-

2.1 Inputting values to array table


int count = 0, size = 6, anint;
cout << “Enter an integer: “;
cin >> anint; cout << endl;
while( cin.good() && count < size )
{
table[ count++ ] = anint;
cout << “Enter an integer: “; cin >> anint; cout << endl;
}
Note the need to check two conditions:-
! The input is a valid integer cin.good()
! The end of the array has not been reached count < size
For this reason, the input is read into an auxiliary variable anint before the start of
the loop and before it is assigned to an array element inside the loop. A further input
is then assigned to anint at the bottom of the loop.

2.2 Outputting values from array table


for( int i = 0; i < count; i++ )
cout << table[ i ] << endl;
Note that, in this example, the condition for the loop to continue is controlled by the
number of items entered (count). This might be less than the total number of
elements in the array. Attempting to process elements of an array that have not been
given a value can lead to unpredictable results.

36
Arrays

2.3 Shuffling array elements one position left (or down)


This requires care to avoid overwriting the changes.
const int size = 6;
int table[ size ] = { 0, 1, 2, 3, 4, 5 }; // initialised on declaration - see below
Original contents 0 1 2 3 4 5
for( i = 1; i < size; i++ )
table[ i - 1 ] = table[ i ]; // shuffle the contents one element to
// the left
Shuffled left 1 2 3 4 5 5
2.4 Shuffling array elements one position right (or up)
for( i = size - 1; i > 0; i-- ) // traverse the array backwards
table[ i ] = table[ i - 1 ]; // shuffle the contents one element to
// the right
Shuffled right 0 0 1 2 3 4

3. Array initialisation
Arrays may be initialised on declaration by enclosing a list of values within braces,
separated by commas. If all elements of the array are given values in this way, the number
of elements need not be supplied between the brackets after the array name:-
int table[] = { 9, 14, 7, 5, 1, 3 };
Multi-dimensional arrays may be initialised by placing braces around each row, and
separating the rows with commas (see the definition of type Plane in section 4):-
Plane aPlane = {
{ 'X', ' ', 'X', 'X' }, // Row 1
{ ' ', 'X', ' ', 'X' }, // Row 2
.... // etc.
{ 'X', 'X', ' ', 'X' } // Row 12, no comma
};
Where some initialisers are omitted, and the array is not auto, the remaining elements are
set to 0. The behaviour for auto (local function) variables is undefined.
The number of elements in an array can be found by the built-in sizeof function:-
cout << "sizeof(table) = " << sizeof(table) << endl
<< "sizeof(int) = " << sizeof(int) << endl
<< "num elements = " << sizeof(table) / sizeof(table[0]) << endl;
sizeof(table) = 24
sizeof(int) = 4
num elements = 6
But note that sizeof cannot be used in a function to find the size of an array formal
argument since this is a pointer.

37
Arrays

4. Multi-dimensional arrays
There is no theoretical limit to the number of dimensions an array may have, although the
number of elements increases rapidly with the number of dimensions as do the chances of
there being redundant elements. Two dimensional arrays are declared with 2 values, each
enclosed in brackets:-
// airplane reservation system
const int maxRows = 12,
seatsPerRow = 4;
typedef char Plane[maxRows][seatsPerRow]; // declares a new type based on a
// fundamental type
Plane aPlane; // aPlane is a variable of type Plane
void makeEmpty( Plane aPlane)
{
for( int row = 0; row < maxRows; row++ )
for( int seat = 0; seat < seatsPerRow; seat++ )
aPlane[ row ][ seat ] = ' '; // Space = empty
}

Functions that operate on the Plane data structure Seat


1 2 3 4
bool seatFree( Plane aPlane, int row, int seat );
// return true if row,seat is a space, else false 1
2 X X
void allocateSeat( Plane aPlane, int row, int seat );
// mark seat allocated with an 'X' 3 X
Row 4 X X
void showSeatingPlan( const Plane aPlane );
// show plan with spaces and Xs as opposite
11 X X
5. Arrays as function arguments 12 X

An example of a 2 dimensional array aPlane of type Plane being passed to a function


appears in 4 above. In C++, an array formal argument to a function is always a pointer to
the first element of the array. This is automatic without any action on the part of the
programmer. Within the function, the array may be subscripted in the normal way. This
explains why, in the function makeEmpty above, it was not necessary to use a reference
argument to ensure that the changed value of the array was passed back to the point of the
call. Since a pointer is passed automatically, any change to the formal argument within the
function body is, in fact, being made to the actual argument. If it is not intended that the
function should modify its formal argument, then the argument should be const modified
to indicate the fact. The compiler will then flag an error if the function body contains
statements that might modify the formal argument.

void showSeatingPlan( const Plane aPlane ) aPlane is a constant and may not
appear on the LHS of an
assignment within the function.

38
Arrays

6. Pointers and arrays


This has already been introduced under pointers. Note that an array name unqualified is
treated by the compiler as an address, so
const int size = 6;
int table[size] = { 0, 1, 2, 3, 4, 5 };
int *ptr = table; // assigns to ptr the address of the first element of table
cout << *ptr // outputs the object to which ptr points, namely the
// integer 0
*ptr = 10 // changes the value of table[0] to 10
ptr++ // moves ptr to point to the next element of the array
cout << *ptr // outputs 1
cout << *(table + 3) // outputs 3
cout << table[3] // same as above, outputs 3

Unlike most other languages, C++ supports pointer arithmetic and, since table is a pointer,
a variable can be used to indicate an offset from the beginning
for ( int i = size - 1; i > 0; i-- )
*( table + i ) = *( table + i - 1 );// shuffle contents one element to the right
or, using a supplementary pointer The compiler knows the size of an int, so p--
results in p being adjusted by sizeof(int), i.e.
for ( int* p = table + size - 1; p > table; p-- ) by 2 or 4 bytes on a PC (depending on the
compiler), similarly with p - 1
*p = *( p - 1 );

address of table + size(6) - 1 While the address held by p > the address of
elements = address of last table
element

39
Arrays

7. Character strings and variable pointers


Notice the difference between char word[] = "hello" and char *greeting = "hello". word
is a constant address where the string is stored. greeting is a pointer containing the address
at which the string is stored.
char word[] = "hello";
char *greeting = "hello";
cout << "word[] = " << word << endl; // OK. No problem
cout << "greeting = " << greeting << endl; // OK. No problem
word = "fred"; compiler error: "incompatible types in assignment of
'char[5]' to 'char[6]'" because word is a constant pointer
strcpy(word, "wilfred"); and can't be assigned

greeting = "william"; do this instead, but note that, if the new string is longer, the
extra chars are stored outside the array's allocated memory
cout << "word[] = " << word << endl; and may cause the program to crash

cout << "greeting = " << greeting << endl;

OK because greeting is variable pointer. Fresh memory is


allocated for the new string and greeting is changed to point
to the new location.
8. Character string input/output
As shown above, inserting into the output stream either the name of a character array e.g.
word or a pointer to a character string e.g. greeting has the same effect.
setw(<field_width>) causes a string to be output right-justified in field_width. It can be left
justified by the manipulator
cout.setf( ios::left, ios::adjustfield );
or by setiosflags(ios::left) as in
cout << setiosflags(ios::left)
<< setw(10) << word << endl;
cin can be used for string input, but terminates at the first whitespace character (space,
tab). To avoid possible overflow by the input exceeding the space allocated to the string,
setw can be used within cin to limit the number of characters entered. The excess
characters are held in the input buffer and are used to satisfy any subsequent use of cin.
const int MESSAGESIZE = 4;
char input[MESSAGESIZE+1];
cout << "Enter a message without spaces: ";
cin >> setw(MESSAGESIZE+1) >> input;
char overflow[80];
cin >> overflow;
cout << "your input: " << input << endl
<< "the overflow was: " << overflow << endl;
To input lines of text whose length is unknown at compile time, use
cin.getline( char *line, int limit, char delim = '\n' )
The input is restricted to limit characters (e.g. 80 for a typical line of text) and is
terminated by the supplied delimiter that defaults to newline and may be omitted to use the
default. The terminator is not stored in the array. The address at which the line is stored is
held in the pointer line

40
Arrays
const int linelen = 80;
char line[linelen+1];
cin.getline( line, linelen ); // excess chars over 80 discarded
while( !cin.eof() )
{
cout << line << endl; // output the line
cin.getline( line, linelen);
}

9. Arrays of pointers and pointers to pointers


Arrays of pointers can point to different arrays whose declared lengths differ. Thus arrays
of pointers to char can accommodate jagged arrays i.e. arrays of string whose lengths are
different - not just different in the number of characters held, but also in the numbers of
elements allocated in memory.

char *ptr[4] = { "one", "two", "three", "four" }; // array of 4 pointers to char


ptr[0] 36714 o n e \0
Assume that the address held in ptr[0] is 36714 ptr[1] 36718 t w o \0
Using the Borland C++ Debug Inspect4 menu item:- ptr[2] 36722 t h r e e \0
ptr[3] 36728 f o u r \0

-------- Inspecting ptr -------


8F50:0FF0
[0] 8F4C:001E "one" 36714
[1] 8F4C:0022 "two" 36718
[2] 8F4C:0026 "three" 36722
[3] 8F4C:002C "four" 36728

This makes for efficient use of memory when storing large numbers of strings.
The 4 arrays of char are allocated contiguously in memory and the above could be viewed
as follows:-

o n e \0 t w o \0 t h r e e \0 f o u r \0

ptr[0] 36714
ptr[1] 36718
ptr[2] 36722
ptr[3] 36728

Printing this array of pointers can be done by


for (int i = 0; i < 4; i++ )
cout << ptr[i]) << endl ;

4
The GNU C++ debugger built into RHIDE does not support inspect

41
Arrays

10. Command line arguments


You have already encountered programs that accept command line arguments, e.g. dir /w.
Dir accepts an argument w that indicates a wide display of file names. The slash is just an
indicator that an argument follows.
MS DOS provides the facility for programs to pick up arguments supplied at the command
line when invoking a program. For example pretty.exe might be a C++ program to 'pretty
print' C++ source files, in the command line invocation pretty myprog.cpp the argument
myprog.cpp represents the name of the source file to be printed.
In C++, information about these command line arguments is provided by 2 arguments to
function main named by convention:-
! int argc the number of arguments (including the name of the executed
program)
! char *argv[] an array of pointers to char representing the strings appearing
on the command line.
In the above example, argc = 2, argv[0] is a pointer to the string "pretty", and argv[1] is a
pointer to the string "bacteria.cpp". Whitespace on the command line separates the
arguments into the individual components of argv[].
Thus a command line containing myprog /x/y/t myfile would represent 3 arguments, with
"myprog" in argv[0], "/x/y/t" in argv[1] and myfile in argv[2], whereas myprog /x /y /t
myfile would produce argc with the value 5 with argv[0] holding the string myprog the
four arguments /x, /y, /t and myfile held in elements argv[1], argv[2], argv[3] and argv[4]
respectively. What these arguments mean, of course, is up to the author of myprog. It is
good practice to check the number of arguments in main and, if the number falls outside
the number expected (often a variable number of arguments can be entered), an error
message is issued and the program terminates. If no arguments are supplied (other than the
program name, of course) and at least one is expected, then it is usual to print the program
name together with a list of valid arguments. This list should not be verbose and should
not exceed about 22 lines otherwise some lines will disappear off the top of the screen.
There is a convention that MSDOS programs expect arguments announced by the slash '/'.
In Unix the character used is invariably minus '-'.
Assuming that you have written a program to pretty-print a C++ program; that the program
name is pretty and that 3 arguments are allowed:-
1. /ln print n lines per page, where n is an integer (optional - defaults to 60)
2. /fn print with font size n, where n is an integer (required)
3. filename to print (required)
argc will hold a maximum value of 4 (the name of the program plus 3 arguments) and a
minimum of 3. If argc < 3 or argc > 4 then there is an error and the program should display
an error message to the terminal and then terminate. The error message would be
something like:-
incorrect number of arguments
usage: pretty [/ln] /fn filename
/ln = print n lines per page
/fn = use font size n (8..12)

42
Arrays
Note the square brackets to indicate an optional argument. The program can then be
terminated with either:-
! return 1; when the error is detected in main, or
! exit(1); in other cases. exit is in cstdlib (or stdlib.h).
By convention, a non-zero value returned from main or as an argument to exit indicates an
error. In both cases, other non-zero values can be used to indicate different error
conditions.

11. Initialising pointer arrays


Here is an example of an array being used to provide a lookup table. In a program
involving the use of dates, it is likely that a facility to convert a day number into the name
of a day of the week may be required. The day numbers would be in the range 0 - 6, and
values within this range would be passed as an argument to a function that returns the
corresponding day of the week, i.e. "Sunday" - "Saturday". We introduce here the concept
of static function local variables. These are variables declared within a function whose
scope is limited to the function body but, unlike auto local variables their life is that of the
surrounding program - they are not destroyed when the function terminates. This topic is
covered more fully in the chapter on Program Files para. 8
char* dayname( int daynum )
{
static char *name[] = { "Sunday", "Monday", "Tuesday", "Wednesday"
"Thursday", "Friday", "Saturday" };
return name[daynum];
}
….
int daynumber = 2;
cout << "day " << daynumber << " is " << dayname(daynumber) << endl;
The static local variable name is created and initialised only once. Thereafter, the
declaration is ignored and the return statement simply looks up the day name within the
array that corresponds to the incoming argument. Note that no check is carried out on the
argument, so if it falls outside the range of values 0 - 6 the function will either return an
incorrect value or cause a runtime error.

12. Review
You will, by now, have seen that arrays and pointers to arrays in C++ are somewhat
complex and error-prone. This is because these facilities were designed over 20 years ago
for 'C' (a language that was originally designed for writing operating systems) and have
had to be retained in C++ for backward compatibility. In fact, the object-oriented facilities
provided by C++ allow these deficiencies to be hidden from the application programmer
who can use libraries of classes e.g. class string which hides the underlying shortcomings
of the built-in array of char type. In particular, the disadvantage of the fixed size of built-in
arrays and the absence of array bounds checking can be overcome in container classes
which are provided with most C++ implementations and are now standardised as the
Standard Template Library. However, we shall be concerned with how container classes
are designed and written and we therefore need to understand the base facilities on which
they are built.
You will be provided with a simple String data type that can be used for assignments. You
should read Skansholm pp 91-93 on the standard string type that is now part of the
Standard Template Library. If you wish, you can use this standard type wherever strings
are required.

43
Arrays

13. Summary
! The array type allows a collection of items of the same type to be stored under a
single name. The array declaration specifies the type of its components and the
number of elements.

! Individual components of an array can be accessed by subscripting the array name


with an integer expression, making them well suited to processing by loops. The
compiler provides no run time checking of array bounds so that care needs to be
taken to ensure that array bounds are not exceeded otherwise memory may be
corrupted.
! When an array is passed to a function, the address of the first element of the actual
argument is copied into the corresponding formal argument. An array formal
argument can be declared as either e.g. int table[] or int *table they both mean a
pointer to an array of int. Within the body of the function, the components of the
array may be accessed either using subscripts, normally in the form of a variable
whose values are controlled by a loop e.g. table[ i ], or by a pointer. In the formal
argument list of a function, a multi-dimensional array must specify the number of
all dimensions except the first. Arrays with 2 or more dimensions are likely to be
specific to a particular application and are best given a new type name using
typedef.
! Arrays can be initialised on declaration with values inside braces separated by
commas. Any items unspecified in this way are initialised to 0 except in auto
declarations where the treatment of unspecified values is undefined. This default
initialisation only has meaning for the primitive types.
! Strings are one-dimensional arrays of char terminated by the ASCII NUL ('\0')
character. Room must be allowed for this character otherwise output and other
routines will not behave correctly. In some programmer-defined functions that
process arrays of char, the terminator must be provided by the programmer.
! Arrays of pointers to char can be used to handle arrays of strings. This is how
command line arguments are provided as the second argument (argv) to function
main, the first argument (argc) being an integer representing the number of
arguments.
A array of pointers to char can be initialised with a list of strings.
The number of pointer elements, unless given within the brackets, is fixed by the
number of strings in the initialisation list. Output of this array of char pointers could
be by:-
int size = sizeof(course) / sizeof(course[0]);
for ( int i = 0; i < size; i++ )
cout << course[ i ] << " ";
cout << endl;
sizeof(course[0]) will yield either 2 or 4 (the size of a pointer), and sizeof(course)
will yield either 10 or 20 (5 pointers). The value of size in either case will be 5.

44
Arrays

14. An array application - Stack of char


A stack is an abstract data type - a type that is not provided by the programming language
but which can be implemented by using the data structuring facilities of the language. A
stack works on the LIFO (last in, first out) principle - the last item put onto the stack is the
first to be removed from it. The last item put onto the stack is at the top of the stack and
the next item to be removed will be taken from the top. Access to the stack is at one end
only - the top. Compare it to a stack of plates - the next one to be used is the latest one to
be placed onto the stack. The standard operations on a stack of char are:-
! void push( char ) char is pushed onto the stack
! void pop( void ) the top of stack item is removed
! char top( void ) the top of stack char is returned, the stack is unchanged
! bool empty( void ) returns true if the stack is full, otherwise false
! void makeempty( void ) empties the stack
One way of implementing a stack is to use an array:-
// charstck.cpp
// illustrates an array implementation of a stack of char
const int MAXSTACK = 20; // 20 elements
char stack[MAXSTACK ]; // the stack
int thetop; // the index value of the current top of stack
// (initially empty)
int main ( void )
{
void push( char ch ); // 5 function prototypes
void pop ( void );
char top( void );
bool empty( void );
void makeempty( void );
char word[] = “abracadabra”;
makeempty();
for ( int i = 0; word[ i ] != ‘\0’; i++ ) // push each letter of word
push( word[i] );
cout << word << “ reversed = “;
while ( !empty( ) )
{
cout << top( ); pop( ); // output the top char and then pop
}
cout << endl;
return 0;
}
void push( char ch )
// post - ch has been placed at the top of the stack
{ ... }
void pop ( void )
// pre - the stack is not empty
// post - the top of stack item has been removed
{ ... }

45
Arrays
char top( void )
// pre - the stack is not empty
// post - the top of stack item has been returned. The state of the stack is unchanged
{ ... }
bool empty( void )
// post - if the stack is empty, true is returned, else false is returned
{ ... }
void makeempty( void )
// post - the stack is empty
// abracadabra reversed = arbadacarba

Note that the code in function main never accesses the array stack directly. All operations
are carried out only via the provided routines makeempty, push, pop, top, empty. This is an
example of data abstraction - the stack data structure is protected from corruption by
requiring all accesses to be made through these functions. In the example, this discipline is
not enforced - it is possible for the stack to be accessed directly since stack is a global
variable that has file scope. We shall see later how direct access can be prevented, and
how the stack can be encapsulated in a single entity that holds both the array and the
variable that records the top of stack.

46
Program Files

Program Files
1. Introduction
The unit of compilation in C++ is the file. A program can be built from several files. These
will comprise:-
! The main program file that includes a function main
! Zero or more ‘modules’ providing support functions, data types etc. comprising
! A header file ( .h ) that contains prototype declarations for the functions
provided by the module and possibly type and data declarations.
! A source ( .cpp ) file containing the definition of the functions, types and
variables provided by the module. This file may or may not be present.
! The object file ( .obj ) created by compiling the .cpp file (see above) that
provides the definition of the functions whose declarations appear in the header.
The main program file contains compiler directives to #include the header file(s) for the
supporting modules. This ensures that functions and variables, constants and types defined
in the supporting source files can be accessed by the main program. In other words, the
header files provide the prototypes for functions and referencing declarations for variables
etc. that allow the compiler to generate code for the main program without the source of
the supporting .cpp files themselves being present at compile time.
At link time, the programmer must indicate which supporting object ( .obj ) files he wants
to be linked with the object code of the main program. Within the GNU C++ IDE this is
done by creating a project which defines all the required source files for a particular
project and ensures that the object code of each is up to date before the linker links them
all in to produce the executable. The project definition itself is saved as a .gpr file which
can be opened and changed as required. By default, the name of the executable file will be
the name of the project file. Thus assign1.gpr (the project file) will cause the executable
resulting from linking all object files to be named assign1.exe regardless of the name of
the main source program file. The default can be changed by the menu item Project.main
targetname.
Take iostream as an example. You must include the compiler directive
#include<iostream> to ensure that the actual text of this header file is included in the
compilation of your main program. Without this, the compiler would not be able to make
sense of a call to e.g. cin.get(). You do not need the source of iostream (iostream.cpp) and
it is not even present on the machine. At link time, the linker sees the header declaration
and knows from this that the object file for iostream must be combined with the object
code generated from the source of your main program in order to produce the executable.
The integrated environment allows the location of the object code of iostream to be
specified and the linker fetches it from that directory for inclusion.
Thus we have the concept of separate program modules that consist of two parts:-
! an interface part - the header file iostream
! an implementation part - the object file iostream.obj. (In fact, you will not find
iostream.obj in the directory because the code is included in the library files in the
lib directory).
The interface part defines the services provided by the module in terms of the functions,
variables, constants and types that are provided (exported) by the module. The
implementation part provides the actual implementation in the form of object code that is
needed at link time.
This is another example of abstraction. We need to know how to call the iostream
functions, and it is convenient that objects like cin and cout are pre-declared. For this

47
Program Files

reason, prototypes of the functions and the declaration of the standard I/O streams are
made available to us in the header file iostream, but the implementation is hidden in the
library files since we need not be concerned with how the functions are implemented nor
how stream objects are represented. Consequently we can access the resources provided by
iostream only via the routines and declarations provided in the header file (the interface).
We cannot access the representation of streams because it is hidden and is therefore
protected from the possible corruption that might have occurred had we been allowed
direct access to it.
Note that the ANSI C++ standard specifies that system header files such as iostream,
string, vector etc. should not be given with a .h file name extension. However, all other
modules (including those that you write) must have the extension .h. The GNU C++
compiler meets this requirement of the standard, but other, older, compilers may not and,
in those cases you will have to use the old name for such system headers, e.g. iostream.h.

2. The steps to produce an executable


Assume that you have a program that consists of the files:-
main.cpp the main program file containing the function main
other.cpp a source file containing the definition of support functions, type and
variable definitions
other.h the header file containing the external referencing declarations for the
functions, types and variables that are defined in other.cpp
! Select Project - Open project - call it myprog.prj
! Add other.cpp and main.cpp to the project
! Compile other.cpp
! Compile main.cpp. (Header file other.h is brought in during compilation)
! Link main.obj with other.obj
! When you choose link with main.cpp
! You are not linking the source files but the object files created by the compiler.
! The linker doesn't know what to link with main.obj unless you have a project
! The linker links together the object code of other, main and of any library code
required e.g. iostream
! You could use make instead of compile and link. This will compile all modules
whose object file has a time earlier than the source file (.cpp) and then link.
The name of the executable is the same as that of the project i.e. myprog ( not main ).
Some students find this process of setting up a project intimidating for some reason. But it
quite simple and has to be mastered in order to write real programs that consist of more
than one file.

3. Types, storage class and scope


Each object that is given an identifier in a program is a reference to a memory location
where that object's representation is stored. Thus the declaration int count associates count
with a location in memory where the bit representation of the value of count is stored.
An object known by its identifier has 3 attributes in addition to its value: -

48
Program Files

! type
This is important because it determines the amount of memory that is allocated for
the representation of the object and also its bit pattern. Thus both the number of
bytes and the pattern of the bits stored in those bytes will be completely different
between e.g. an int and a float even if they appear to hold the same value.
! storage class
This is important because it determines the lifetime of the object, i.e. how long it
remains in existence occupying storage. Storage class has defaults which are
determined by the position in the source code of the object's declaration. This may
be varied by providing an explicit storage class on declaration. There are 3
categories of lifetime -
! local (auto) lifetime is transient and exists only for the lifetime of the
enclosing block (usually a function, but see later).
! static lifetime exists for the duration of the program's execution
! dynamic allocated dynamically during a program's execution. lifetime is
for the duration of the program, or until de-allocation whichever
is sooner. This will be dealt with later.
! scope
This is the portion of the source code within which the object is visible. Thus a
variable declared within a function is visible (in scope) only within the block of
statements that constitute the function body regardless of its storage class. See also
Skansholm Chapter 4.3 Declaration, scope and visibility.
There can be different combinations of scope and storage class, e.g. a function local
variable can be declared static. The effect is that its visibility (scope) remains limited to
the enclosing block (i.e. the function body) but its lifetime continues for the duration of the
program's execution.

4. Local duration
Unlike some programming languages (e.g. Pascal and Modula-2), the body of a function
may not include the definition of another function. In other words, functions may not be
nested in C++ and the only valid definitions appearing within a function are those for data
items. Variables defined in a function have the default storage class auto and the formal
arguments to the function are also treated as auto.
The body of a function is a sequence of declarations and statements surrounded by braces
{}. This construct is known as a compound statement or block. Within a function body,
any statement may itself be a block. It is logical therefore that such a block, nested within
a function body, should be allowed to contain data declarations, and that the scope of those
declarations should be the surrounding block as with function local variables. Therefore
the sequence of statements that depend on the truth or otherwise of the logical expression
in an if statement may be a block that contains declarations whose scope is limited to that
block. A block may even consist of just the braces surrounding one or more statements :-

49
Program Files

void swapifless ( int& a, int& b ) function body block


{
if ( a < b ) if block
{
const int temp = a;
a = b;
b = temp ; inner block
{
int inner = temp; cout << a;
}
cout << inner << endl; //error undefined symbol inner
}
cout << temp << endl; // error undefined symbol temp
}

Function swapifless above could have included a local variable definition int temp
(declared before the if statement). This outer temp would have been invisible within the if
block because the inner temp would have caused a 'hole' in its scope. This hole would
extend for the scope of the if block only.
A local variable can, of course, be initialised on definition. This initialisation can be by
any expression that is valid at that point, for instance by an expression that contains
reference to the formal arguments as above. In the absence of any initialisation, the value
of a local auto variable is undefined.

5. Declaration versus definition


A definition of a function is a block of source code that defines the function and its body:-
void swap ( int& a, int& b )
{
int temp = a; a = b; b = temp;
}
A declaration of a function is just the header followed by a semi-colon:-
void swap ( int&, int& ); // prototype
A definition of a variable is a statement that allocates storage with optional initialisation:-
int count = 0; // allocates storage
A declaration of a variable is a notification to the compiler that a variable has been defined
in another file, but is being referenced in the current file:-
extern int count; // external referencing declaration. Does not
// allocate storage
You will not normally need to make external referencing declarations because our
standard practice will be to #include a header file that serves the same purpose (see para 6
below).

50
Program Files

6. Static duration
An external referencing declaration for a function is no different in form from the
function prototypes with which you are already familiar. It informs the compiler that a
function is to be called from a separate file from that in which it is defined. An external
referencing declaration for a function is made in the source program file in which the call
to the function is to be made, i.e. in the file in which it is not defined. The format is as
follows: -
external void print( void ); // declares a function that is defined in another file
// external may be omitted

External referencing declarations are usually made by placing in the main program file a
compiler directive to #include a header file that provides the necessary external
referencing declarations as explained in paragraph 1.
Variables declared outside of any function - e.g. before function main have file scope and
are referred to as global variables. The C++ compiler guarantees to initialise any global
variables to zero, but it is considered good practice to initialise them explicitly. As with
any data declaration, using the same identifier as another object declared in a surrounding
block, a local variable causes a hole in the scope of the global variable with the same name
- see the example below:-
#include<iostream>
int sum;
int main( void )
{
void subroutine( void ); // prototype declaration
sum = 15;
subroutine();
cout << "Global sum is " << sum << endl;
return 0;
}

void subroutine( void )


{
float sum = 1.234;
cout << "Local sum is " << sum << endl;
}
The global variable sum is distinct from the local variable of the same name in function
subroutine. The latter causes a hole in the scope of the global from the point immediately
after the definition of float sum. The only variable of that name visible within subroutine is
the local one with the value 1.234. As a corollary float sum is not visible with main
because its scope is confined to the function in which it is defined. The program's output
is:-
Local sum is 1.234
Global sum is 15
It is possible to gain access to a global variable even when it is masked by a local variable
of the same name. In function subroutine for instance the global variable sum can be
referenced by preceding it with the double colon scope resolution operator which you
have already met in e.g. setiosflags( ios::left ):-
cout << "Local sum is " << sum << endl;
cout << "Global sum is " << ::sum << endl;

51
Program Files

7. Storage class static


Variables that are explicitly given the storage class static may be either local or global.
The meaning of the static differs depending on whether its declaration appears within a
function or outside.

8. Static local variables


The default storage class of variables declared within a function is auto. This means that
their scope is confined to the block in which they are declared, and also that their lifetime
is the same as that of the block. If a variable declared within a function is initialised with
e.g.
int local = 1;
Then the initial value of local will be 1 for every activation of that function. If it is not
initialised, then its initial value is undefined.
A local variable given the storage class static still has local scope, but retains its value
between successive activations of the block in which it is declared.
void fun1()
{
static int staticlocal = 1;
...
staticlocal++;
}
On the very first occasion that fun1 is called, the value of staticlocal will be 1. But for
subsequent calls staticlocal will have the value that it was last given in the body of fun1
e.g. 2 on entry at the second call, 3, 4 etc. in the above example. In other words,
staticlocal retains its value across activations of fun1 and occupies storage for the whole of
the program's execution.

9. Static global variables


The effect of giving a global variable or function the storage class static is to make it
inaccessible to any program unit (i.e. file) other than the one in which it is defined. In
other words, it can be accessed by any function in the file in which it is declared, but may
not be accessed from any other file, even if an external referencing declaration is given in
the other file.
The effect of static definitions at the global level in source files that have no function main
is to give the programmer of these implementation modules the ability to control the
export of both variables and functions. This is a standard requirement of a programming
language that supports the separate compilation of modules. A function of storage class
static would typically be a support function called by other functions in the same module
but required not to be accessible from another module. A global variable would be given
the storage class static to prevent access to it from any module other than the one in which
is it declared. This is known as data hiding. Items which are explictly made visible (by
declaring them in a header file) are said to be exported from the module. Note that this
mechanism could be used to prevent access to the stack and its top-of-stack indicator if the
char stack in the previous chapter were to be implemented in a separate file.

52
Program Files

10. The C++ pre-processor


This is a simple macro processor that, in the case of GNU C++, constitutes a separate pass
by the compiler. It makes a pass over the source file substituting all occurrences of defined
identifiers with the token string that represents the macro definition. Thus, if you liked
Pascal and also like typing, you could make C++ look more like Pascal by replacing all
occurrences of { with BEGIN and all occurrences of } with END; and by providing macros
that carry out the conversion back to the C++ convention immediately prior to
compilation;
#define BEGIN {
#define END; }
int main( void )
BEGIN
int a, b;
if ( a > b )
BEGIN
int temp = a;
a = b;
b = temp;
END;
return(0);
END;
The macro processor was used extensively in C to produce the effect of inline functions
and constant declarations which are now part of the C++ language. Its use in C++ is
therefore mostly confined to controlling conditional compilation and the inclusion of
header files.

11. Conditional compilation


When developing a complex program, it may be useful to include debugging statements
that output the value of certain variables or that indicate at which point in the source code
execution is currently being carried out. The output can be directed to a file by using
output redirection at the command line. When the program appears to be working
correctly, these debugging statements could be deleted from the source. But all too often,
it is found that bugs still remain and some or all of the debugging statement have to be re-
inserted. The inclusion in the compilation of the debugging statements can be controlled
by macro conditional statements of the form:-

#define DEBUG 1 // Macro definition setting DEBUG to true


...
#if DEBUG
statements1
#else
statements2
#endif

Statements1 and statements2 are actual C++ program statements. The sequence #if
DEBUG, #else, #endif can be scattered throughout the source code and will have the effect
of including statements1 into the compilation if DEBUG is true, and including statements2
if DEBUG is false.

53
Program Files

In order to eliminate the debugging statements, it is only necessary to change the value of
DEBUG from true to false (0), and re-compile and link. The GNU C++ IDE allows macro
constant definitions to be changed via the menu item:-
Options.Compiler options
To define a macro named DEBUG, go to this menu item and enter -DDEBUG. To
undefine it, enter -UDEBUG.
A file macro.cpp is installed in the labs for you to try this out.
The conditional compilation facility may also be used to generate different versions of a
program for different platforms or conditions.

12. Conditional file inclusion


There are two formats for specifying the name of the include file in a compiler include
directive. If the header file name is surrounded by angle brackets, a predefined list of
specified include directories is searched. If the header file name is surrounded by double
quotes, the current directory is searched followed by the specified include directories.

#include<iostream> look in the standard include directories


#include"myheader.h"look in the current directory first, then the standard include
directories

The standard include directories are stored in a directory indicated by operating system
path directives that are set up when the system starts or that are indicated by values that
can be configured from within the IDE.
When developing programs that consist of several modules (files) it is normal to supply a
header file for each module other than the main module. The main module then requires
compiler directives to #include these header files, using the form #include "filename.h". If
necessary, the header file may also be included in the compilation of the .cpp file for
which it is the header. In cases where header files themselves contain include directives,
there is the likelihood that some declarations will be included twice. In those cases, header
file inclusion may be made conditional on the existence or otherwise of a definition
Initially, you will not be writing programs whose complexity requires the use of #ifndef
and #define so do not worry about them unduly. When the linker complains that you have
multiple definitions of a function or variable, you will know that you have hit the problem.
Then seek advice.

54
Data Structures

Data Structures
1. Data Types
Data types can be described in terms of the range of values they may hold and by the
operations provided for them. e.g. type int has a range of possible values from
-2,147,483,648 to 2,147,483,647, and the provided operations include +, -, *, /, %, ++, +=,
>, <=, ==, !=.
We have not dealt in any detail with the way in which type int is represented in memory
because we do not need to know this in order to use the type.
We defined a type Clock to have a range of values representing the times from midnight to
23:59 at intervals of 1 minute. We also provided a small set of operations - gettime, tick
and show.
We try to follow the principle that the definition of such data types provides all the
information another programmer needs in order to use them in his program, but that the
representation should be hidden so that it cannot be corrupted. Another reason for hiding
the implementation is that it should be possible to change it, e.g. to improve performance.
The client program will have to be re-linked with the object code of the new
implementation but, provided that the definition is unaltered, no change should be required
to the source code of the client program.

2. Abstract Data Types


These are data types that are defined entirely in terms of their set of operations without any
consideration of how their values are represented. The domain of values may also feature
in their definition, but often it is so large as to make this not useful. There may, in fact, be
several different ways of implementing them, each with their own set of advantages and
disadvantages. They are often models of objects from the real world or from mathematics,
e.g. Sets, Queues and Lists.
The implementation should allow a programmer to define new instances of the type, but
should prevent access to the representation.

3. Classification
There are two main groups - single entities of which there may be many instances
e.g.Clock, and collections (or containers) of many objects of the same type e.g. Set, List
etc. The components of these collections may be of any type, but, within one collection,
must all be of the same type. Frequently, part of the definition of a collection is the
relationship between the members.

55
Data Structures

4. Categories of Collection
The broad categories are:-

! Collections in which there is no relationship between the


members except that, in the domain of all possible values that
may be a members, each is either a member or is not, e.g. Set Set
and Bag.
! Linear structures in which the members have a one to one
relationship with each other. Linear

! Hierarchical structures in which the members have a one


to many relationship with each other.

Hierarchical (Tree)

! Graphs - where the members have a many to many


relationship.

5. Stacks Graph

Definition
int funa ( int y )
This is the simplest of the linear collection types since the {
number of operations is typically small. As with all containers, return ( y * 2 ) ;
}
the components may be of any type, but must be of the same int funb ( int z )
type within any one stack. Additions to, and removals from the {
stack are made at one end only - the top. Access to components return ( funa ( z ) / 2 );
is limited to the item currently at the top. The consequence of }
this relationship between members is that the first item to be int func( int a )
{
added is the last to be removed. This is known as a LIFO return ( funb( a ) );
structure - last in, first out. }
int main (void )
Stacks are very widely used in Computer Science. When a {
function is called, a stack frame is built containing the address int x = 4, y;
to which control must return when the function has finished y = func( x );
execution. In addition, space is reserved in the stack frame for }
any auto local variables and for the values of any actual
arguments passed to the function. This structure
is pushed onto the system stack. When the
funa
function terminates, the stack frame is popped
from the stack, causing the arguments and local
funb funb funb
variables to perish. Another application is
recording the path taken through a structure so
func func func func func
that it can be retraced - the 'Hansel & Gretel'
effect.
main main main main main main main

Stack frames for the above code


56
Data Structures

The classic operations are:-


push push a new item onto the stack
top retrieve the top of stack item without removing it
pop remove the top of stack item
empty test if the stack is empty

Viewed as an abstract type, a stack cannot be full, but the actual implementation may have
to place a limit on the number of items that can be held on the stack. This gives rise to a
further operation
full test if the stack is full

Operations on abstract data types can typically be categorised into those that:-

! change the state of the data type e.g. push, pop


! report on the state of the data type without changing it e.g. top, empty, full.
! create and/or initialise an instance of the type - no example here
Each operation is provided with a pre-condition and post-condition that states
i) pre - any requirement placed on the caller as to the state of the structure prior to the call,
or on the values passed as arguments; for instance, top and pop must not be called on
an empty stack.
ii) post - the state of the structure that is guaranteed to hold after the operation has been
carried out, provided that the pre-condition has been met; for instance, after a push, the
number pushed is at the top of the stack.
The definition of a stack of integers can be placed in a header file which is then available
for importing (using #include "intstack.h") by any client program requiring it:-
// intstack.h
// definition of a stack of integers
void push( int arg );
// pre - !full()
// post - stack contains the value of arg, top() = arg
void pop( void );
// pre - !empty()
// post - top() has been removed
int top ( void );
// pre - !empty()
// post - stack is unchanged, the item at the top of the stack has been returned
bool empty();
// pre - none
// post - returns TRUE if stack is empty, otherwise FALSE
bool full();
// pre - none
// post - returns TRUE if stack is full, otherwise FALSE

57
Data Structures

Representation
The obvious first choice for representing a stack is an array, although this has the
disadvantage that an upper limit for the number of items to be stored must be chosen
before compiling, and this cannot be varied at run-time. This representation should be
hidden from a user of the stack by specifying the storage class static
// intstack.cpp
// representation and implementation of a stack of integers
#include "intstack.h"
const int MAX_STACK = 10; // the maximum number of items that can be stored
static int data[MAX_STACK]; // the container for the stack members
static int Top; // the index of the top item.
// Top will need to be initialised on startup, incremented
// before pushing a new member, and decremented after
// popping a member.
// When Top = MAX_STACK - 1, the stack is full
Implementation of the operations
This is left as an exercise. The full definition of the functions would be placed after the
global data definitions in intstack.cpp. Note that intstack.cpp contains an include compiler
directive for the header file. intstack.cpp would contain only the data declarations shown
above and the function definitions. There must be no function main.
Using the stack
A client program wishing to use the integer stack would import the definition (i.e. #include
"intstack.h") and then carry out operations on it as though it had been defined in the same
file. Because of the static qualifiers used for the array definition data and the integer
variable Top, the client program cannot access the representation directly even if extern
declarations are made for these two items in the client's source code. const MAX_STACK
also cannot be accessed because of its const qualifier.

#include <iostream>
#include "intstack.h"
int main( void )
{ // push some items
cout << endl << endl;
while( !full())
{
static int item = 0;
push( ++item );
cout << "pushing " << item << endl;
}
Now an attempt to access the stack variables directly - causes linker errors:-
Top = -1; // Linker error undefined symbol _Top - defined as static
cout << "MAX_STACK = " // Linker error undefined symbol
<< MAX_STACK << endl; // MAX_STACK is const in intstack.cpp
// pop them
while ( !empty() )
{
cout << "popping " << top() << endl; pop();
}

58
Data Structures

6. Abstract Data Type?


Can this implementation of a stack of integers be classed as an abstract data type? It has
been defined in terms of its set of operations. It is encapsulated by being placed in separate
files and its representation is hidden from its clients - its state can only be altered through
the supplied operations. But only one stack can exist at any one time in any one client
program. The client cannot declare instances of the type by e.g.
IntStack astack, bstack;
This is clear since there is no mechanism provided by the stack module for specifying on
which stack the operations are to be carried out - there is only one. This single instance of
an encapsulated type is sometimes referred to as an abstract state machine and is simple to
implement and useful when only one instance of the type is required at any one time.
Later, we will see how a true abstract data type can be defined of which as many instances
may be created as the client program requires.

7. Queues
A queue follows closely the real-world example. Operations are permitted at both 'ends'
with additions (enqueue or append) being made at the tail and removals (serve or remove)
being taken from the head. Effectively, the elements are ordered physically according to
the time of their arrival. It is known as a FIFO structure - first in, first out. Typical
operations are:-

! append or enqueue add an element at the tail


! serve or remove remove an element from the head
! size return the length of the queue
! empty query whether the queue is empty
! full query whether the queue is full

Implementation
Again, an array implementation is considered. We need two integers to indicate the head
and tail of the queue and possibly a further integer to record the size (although this can be
computed from head and tail).
const int MAX_QUEUE = 10;
static char queueitems[ MAX_QUEUE ]; // A queue of characters
static int head = 0, tail = -1, count = 0;
Initially, the indicator (technically cursor) tail is set to a special value to indicate the
empty state. The head of the queue can be viewed as being at the 'left hand' or 'bottom' of
the array, while the tail grows 'right' or 'up' the array as items are appended.

59
Data Structures

0 1 2 3 4 5 6 7 8 9
head
1. Empty
tail

0 1 2 3 4 5 6 7 8 9
head

2. append('A') A
tail

0 1 2 3 4 5 6 7 8 9
head

3. append('B') A B
tail

0 1 2 3 4 5 6 7 8 9
head
4. ch = serve() A B
tail

0 1 2 3 4 5 6 7 8 9
head

5. append('C') A B C
tail

The problem with this method of handling the array is that as items are appended and
served, the queue moves up the array, and will eventually bump up against the end when,
in fact, there may be space available lower down caused by elements being removed from
the head e.g. ‘A’ in this case. One solution is to slide all items in the queue down the array
once the tail has reached the top, but data moves are relatively expensive - particularly if
the queue elements are large.
A satisfactory solution is to view the array as
circular so that the first element follows on
count = 6 0 tail
immediately after the last. Spare space in the 9
N O 1
array caused by removals will always be
M
available for use as long as the number of 8

elements remains below MAX_QUEUE. 2


L Process
Instead of simply incrementing head on each
7
removal, and tail on each append, these two 3
cursors must be taken modulus K

MAX_QUEUE each time they are 6 J 4


incremented. Thus, if e.g. tail is presently 9, head 5

and a further element is appended, tail


becomes ( 9 + 1 ) % 10 = 0, and the newly
arrived element is inserted at array element 0.

60
Data Structures
void enqueue( char element )
{
tail = (tail + 1) % MAX_QUEUE;
queueitems[tail] = element;
count++;
}
The simplest way of implementing the test for full and empty is to maintain the size of the
queue in a variable (e.g. count) within the queue module.
As with all data structures based on an array, the storage space is fixed at compile time and
the number of items that can therefore be stored is bounded. This inflexibility means that
arrays can only be used in cases where the maximum number of components can be
determined in advance.

8. Lists
Basically a list is a sequence of elements, each element other than the first and the last
having a predecessor and a successor. Another way of expressing this is that a list is
! either empty or
! consists of an element followed by a list.
This is known as a recursive definition.
The elements may be ordered:-
! by their time of arrival, i.e. each successive addition is placed after the previous last,
or
! inversely by their time of arrival - each element is inserted before the previous in a
similar way to a stack, although access may be allowed to any element.
! by some quality of the data e.g. a list of names ordered alphabetically.
! by requesting insertion at the 'current' position as indicated by some cursor.
Again, an array is considered as the method of representation. However, we find that there
is a high cost involved where insertion and deletion is permitted other than at the ends.
Each insertion within the list will require all elements following it to be moved ‘up’ the
array to make room, and, since there can be no ‘null’ elements, each deletion will require
all following elements to be moved down to close the gap. The time required to carry out
these moves makes this method of representation less than optimal. There are more
efficient and flexible ways of implementing lists in cases where insertions and deletions
are permitted within the list.

9. Structs
Frequently there is a need to store information about an entity under a single name where
the information describing that entity involves different data types. The struct is an
aggregate type that provides this facility:-
struct student // student is a type, not a variable.
{
char name[30];
int age;
char coursecode[6];
}; // note the semi-colon
student courserep; // courserep is one student

61
Data Structures

Each separate data item within the structure is referred to as a data member. Once the new
type student has been declared, a collection with that component type can be defined.
student aclass[16]; // aclass is an array of 16 students
Access to the members of a struct is by dot notation:-
strcpy( courserep.name, “William Brown” ); // simple assignment not allowed
courserep.age = 21;
strcpy( courserep.coursecode, “mit96” );
cout << courserep.name << endl << courserep.age << endl
<< courserep.coursecode << endl;
A queue of students could be declared as:-
const int MAX_QUEUE = 16;
static student stuqueue[ MAX_QUEUE ]; // A queue of students
static int head = 0, tail = -1, count = 0;

10. Unions
This is similar to the struct in that it can hold one or more items of different types. It
differs from struct in that it can hold only one of its components at any one time. The
compiler allocates storage for the largest of the specified members and all members are
overlaid onto the same storage. In other programming languages this type is usually
known as a variant record. There are two main uses for unions.

! In cases where different instances of the same entity may have different
characteristics, i.e. they are described by a different set of variables. This might
arise in a collection of students where part-time students require a record of their
employer whereas full-time students do not.
! In low level programming when a location in memory may be viewed as two
different sets of data, e.g. either two separate integer values or a long integer.

Example:
typedef short TwoInts[2];
union cheat
{
Twoints twoints;
long along;
};
cheat x;
x.twoints[0] = 255;
x.twoints[1] = 1;
cout << x.along << endl; 65791

62
Dynamic Data Structures

Dynamic Data Structures


1. Structures
These were introduced in the previous chapter. The type name in C and C++ is struct, but
in most other languages they are known as Records. They are particularly useful for
modelling real-world objects that are described by a set of attributes (data values). The
syntax is

struct type-name
{
list-of-members
};

This is a type definition and does not allocate storage. It introduces a new type that can be
used subsequently in definitions of variables whose type is type-name.
Examples:-

struct Date struct Person struct Student


{ { {
int year; char name[20]; Person personaldata;
int month; Date birthdate; char tutorGrp;
int day; char address[4][20]; int modulemarks[9];
}; }; };
... .... ...
Date today, his_birthday; Person Fred, Jane; Student mscit[40];

These examples illustrate several things about the data type struct.
! The members (referred to as fields in other languages) may be of the same type, or
of different types.
! There is no limit to the number of members, but large records can be built up from
other struct types, for instance, type Person has a field birthdate which is itself a
struct type (Date).
! The members may be of any type, including arrays (and other structs)
! The type name can be used in declarations of arrays whose elements are of struct
type, e.g. mscit is an array of 40 elements, each of whose data type is Student. Each
Student has a data member called personaldata of type Person; a tutorGrp of type
char; and an array of 9 elements of type int called modulemarks.
! The type-name appearing after the reserved word struct is known as the structure
tag. It is desirable that this name (e.g. Date, Person, Student) be unique within its
own scope.
As you can see, structures can be used in combination with other structures and with
arrays to create arbitrarily complex types capable of modelling many real-world entities.

63
Dynamic Data Structures

2. Comparison between structs and arrays


! Component data type
The elements of an array must all be of the same type whereas structs may contain
data members of different types.
! Assignment
An array may not be assigned to another array because an array name is a constant
pointer whereas the use of a structure variable name accesses the whole structure.
The consequences of this are important:-
Variables of structure type may be assigned to other variables of the same type. The
effect of assignment is to copy all of the fields from the source structure to the target
structure (including each element of any array members of the structure). Thus we
could write
Jane = Fred;
or mscit[1] = mscit[2];
! Function arguments and return
Structure arguments are, by default, passed to a function by value (not as a pointer
in the case of arrays). However, a reference argument may be used to reduce the
cost of copying large structures and/or to enable any changes to the structure to be
reflected in the actual argument. If the objective is to eliminate the cost of copying
large structures when a function is called and it is not the intention to modify the
structure within the function, then the formal reference argument can be const
modified, e.g.

void printDate( const Date& aDate )


{
cout << aDate.day << '/' << aDate.month << '/'
<< aDate.year << endl;
}
There is no intention to change the value of the argument aDate since it is only
being output. However, to reduce the cost of copying the actual argument into the
formal argument, the formal argument is made a reference to the actual argument -
Date&. Copying a reference involves only a few bytes.

A function may return a structure or a reference to a structure as its result.


Example:-
Date changeDate( Date aDate );
{
aDate.year++;
return aDate;
}

! Access to components
Elements of an array can be accessed by subscripting the array name as in the
example above. The subscript can be a variable that is modified within a loop c.f.
the Plane example. This allows computed random access to any array component.
The members of a struct, on the other hand, are accessed using dot notation i.e. the
structure variable name followed by a dot followed by the member name. The dot is
known as the structure member operator. If the member name is itself a structure
and access is required to its members, then further dots are required to tunnel down
through the member hierarchy, viz.
64
Dynamic Data Structures
Fred.name;
Fred.birthdate.day;
mscit[10].personaldata.birthdate.year;
mscit[20].personaldata.address[1];
mscit[30].marks[2]; // the marks of student number 30 for the
// second module

! Pointers to structures
If a structure is referenced by a pointer then the de-referencing operator applied to
the pointer provides the access:-
Date* dptr = today; // dptr is a pointer to Date and points to the Date today
Date dt = *dptr; // dt is assigned the value of today by dereferencing the
// pointer dptr

However, the structure member operator (dot) has a higher precedence than the
dereferencing operator (*). So access to a member of today via the pointer dptr must
use parentheses to resolve the precedence:-
cout << (*dptr).year; // displays the year member of today via the
// pointer dptr
This type of access is frequently required and the syntax is rather clumsy. A new
operator is introduced for this purpose - the structure pointer operator ->. This does
two things - dereferences the pointer to access the whole structure, and then
accesses the member given after the operator (year in this example).
cout << dptr->year;
! Initialisation
As with arrays, structures may be initialised at the time they are defined, e.g.
Date his_birthday = { 1995, 11, 15 };

3. Storage Management
So far we have only been able to use data items that have been defined at compile-time.
Thus, an array defined in the source code of a program as:-
int table[100];
Will hold 100 integers and, if the requirements of the program exceed this number of
elements, then the excess cannot be handled. Clearly this is unsatisfactory. The
programmer cannot predict the demands that will be made on his program when it is being
used by a client. What may have seemed a generous estimate when the program was
written might soon turn out in practice to be a ludicrous under-estimate. What is more, if
the estimate is indeed generous, then a large amount of storage space remains unused and
therefore wasted because it cannot be used temporarily by other data items.
An example is a windowing system like MS Windows. The programmers of Windows
could not possibly have worked on the assumption that the number of open windows
should never exceed a certain fixed limit. Since that code was written, the memory
installed in the average PC has at least doubled, redoubled, and redoubled again. To have
fixed this limit 3 or 4 years ago would have put all users in a straight jacket which would
now appear intolerable.
So how can we create and delete data items dynamically at run-time in response to the
demands of the application program?
By using the memory allocation and deletion procedures new and delete. The use of these
routines is closely bound up with pointers and equivalent facilities are to be found in most
of the conventional programming languages such as Ada, Pascal, Modula-2 and C.

65
Dynamic Data Structures

3.1 new
The syntax is new type-name [number-of-elements], where [number-of-elements] is
optional and is used when a dynamically allocated array is required.
Examples int* intptr = new int;
char* chptr = new char[20];
The first statement allocates from the heap a chunk of memory sufficient to hold
one integer and sets the pointer to integer intptr to point to this memory location.
The heap, or free store is the name given to that part of available random access
memory that is not currently occupied by program code and ordinary program
variables.
The second statement allocates sufficient memory from the heap to accommodate
an array of 20 characters and sets chptr to point to the first.

3.2 de-referencing the pointers


Notice that the new data items are anonymous - they have no name. This is not
surprising since the compiler is responsible for associating variable names with
memory locations and the compiler did not know whether or not we would execute
these two statements at run time - they may be encountered only if the user selects a
particular menu option. Access to the newly allocated data items is obtained only
via a pointer that points to them:-
*intptr = 99;
cout << "intptr points to " << *intptr << endl;
strcpy( chptr, "Hello, Hello!!!!!!!");
cout << " and chptr points to " << chptr << endl;

In the assignment and output statements, intptr needs de-referencing to produce the
value of the integer to which it points. chptr, on the other hand, does not require de-
referencing since we want the whole array to be assigned or output rather than just
the single character to which chptr points. This treatment is analogous to that of an
array name.

3.3 delete
the delete operator has two forms, without brackets for single data items, and with
brackets for arrays. Note that, whereas the form of new required the brackets to be
placed after the type name:-
char* chptr = new char[20];

the syntax of delete requires the brackets to be placed after delete


delete intptr; // de-allocate memory occupied by int pointed to by intptr
delete[] chptr; // de-allocate memory occupied by string pointed to by chptr
The effect of delete is to return back to the heap the memory referenced by the
pointer (intptr and chptr in the above examples) and not to delete the pointer itself.
After this, it is an error to attempt to de-reference these pointers in order to access
the item they previously referenced.

3.4 Lifetime
The lifetime of objects allocated by new is from allocation to the earlier of de-
allocation (via delete) or termination of the program.
Notice that lifetime may be different from scope. If a pointer providing access to a
dynamically allocated item goes out of scope (perhaps because it is a local function

66
Dynamic Data Structures
variable and the function terminates) then the dynamic data item continues to exist,
but is inaccessible. This is known as memory leakage. If it happens often enough,
the program could run out of memory even though not all is being used. Local
function variables can be used for allocating dynamic data items, but it is necessary
to ensure that, before the function terminates, some other pointer that will continue
in scope is set to point to it.
int* makenewtable( int size )
{
int* intptr = new table[size];
return intptr;
}

Since the function returns a pointer to integer, the result of the function call will be
assigned to some other pointer to integer and access will not be lost by the demise
of inptr:-
int* newtable;
newtable = makenewtable( 20 );
If there is insufficient memory available on the heap when new is called, new
returns the special pointer value 0. This means that the pointer does not point to
anything and that, in this case, the allocation has failed. When building dynamic
data structures, 0 is frequently used as a pointer value to indicate that no link exists
between components of the structure.
int* intptr = new table[ size ];
if ( intptr == 0 )
{
cout << "Error, insufficient memory " << endl;
exit(1);
}

The size argument to new permits the size of a dynamically allocated array to be
determined at runtime. This can be used to get over the fixed size problem of arrays.
The array is allocated on start-up with. say 10 elements. When it becomes full,
makenewtable is called with an argument of, say, double this (i.e. 20). The contents
of the original array are copied into the newly allocated one, and the old array then
deleted. Next time the array becomes full, makenewtable is called with an argument
of 40, and the copying done again. In this way, the effect of a dynamically
resizeable array can be obtained. However, during this doubling process, there is a
temporary requirement for additional memory that might cause memory exhaustion.
Also, the requirement that the old data be copied into the newly allocated table is
relatively costly in terms of time, and it is therefore advisable to minimise the
number of resizing operations wherever possible - this is the reason for doubling the
size on each resize.

67
Dynamic Data Structures

4. Dynamic Data Structures - Linked Lists


A list is a sequence of data items, each item other than the first and the last having a
predecessor and a successor. A more elegant definition using recursion (and one which
can be realised in most programming languages) is :-
! a list is either empty or
! consists of a head representing a single data item followed by a tail which is a list of
data items.
Lists may be implemented using arrays but dynamic memory allocation is more flexible in
that the list may grow and shrink in response to the demands of the application.
The list should be
viewed as a series of
nodes, each node Linked List
containing some data last
and a link to the next first
node. The link is a
count
pointer to a node, and
the node is most data link data link data link
usefully implemented Node Node Node
as a struct.
For simplicity, a list of
integers will be illustrated, but the data contained in a node (struct) may be as large or as
complex as the application requires. The node is therefore defined as:-
struct Node
{
int data;
Node* link;
}

Each node therefore consists of a data field (in this case an integer) and a pointer to the
next node. The list itself can be implemented as a structure containing links to the first and
last nodes in the list, and a count of the number of nodes. These links are, again, of type
pointer to node. If the list is empty, then the links to the first and last nodes are given the
special value 0 referred to above. The same principle will be applied to the link member of
the last node in the list since it will have no successor:-
struct LinkList
{
int count;
Node* first, * last;
}
The operations for a list are much less closely prescribed than those for stacks and queues
since it is a more general structure and access may be provided at any point. There are also
several possibilities for the ordering of the nodes. For simplicity therefore, the example
shown below will add new items to the end of the list, and remove items from the front.
This is therefore, in effect, a queue.

68
Dynamic Data Structures

4.1 List Cursors


Sometimes a list is provided with an internal cursor that can be moved about by
making calls to appropriate functions. At any time, additions may be made at the
position indicated by this internal cursor, and also deletions provided the list is not
empty. The addition of a cursor (a pointer to Node) and the operations to move it
are left as an exercise. It is not usual to provide a print function for a data structure
since it ties it a particular I/O regime which may not be appropriate for all
applications or on other platforms. However, it is sometimes useful for debugging
purposes and one is included here to demonstrate a traversal of the list. These
example operations all have a reference to a list as one of their arguments. This
allows the client program to declare several lists, and to specify via the actual
argument on which list the operation is to be carried out.

4.2 Initialising the list


void init( LinkList& t)
{
t.count = 0; // count of elements = 0
t.first = 0; // pointer to first element does not point to anything
t.last = 0; // pointer to last element does not point to anything
}
4.3 Creating a new node
static Node* newnode() // function that returns a pointer to a Node. This
// function is private to the list module (it does not
// appear in the header file and is
// declared with storage class static to prevent
// access by a client)
{
Node* n = new Node; // allocate memory from the heap sufficient to
// accommodate a Node and store a pointer it
// in n.
return(n); // return the pointer as the function's result
}
4.4 Checking for empty
bool empty( const LinkList& t) // the argument is const because the list is not
// changed by this function
// post: returns true if the list is empty, false otherwise
{
return (t.count == 0);
}

69
Dynamic Data Structures

4.5 Adding a new item to the list


void add (LinkList& t, const int item)
// post: item is added at end of list
{
Node* n = newnode(); // create a new node dynamically by calling
// function newnode
n->data = item; // put incoming data into the data member of the
// new node
n->link = 0; // and set its link member to point to nothing
if (empty(t)) // special action required if empty
t.first = n; // set first to point to the new (first) node
else
t.last->link = n;// set link member of last node to point to the new
// node
t.last = n; // set 'last' member of the list to point to new (last)
// node
t.count++; // increment the count
}

LinkList
last

first

1 2

Node Node

Node *n = newnode(); data link

Heap
n->data = item;
3
n->link = 0;

a) t.last->link = n;
b) t.last = n;
c) t.count++;

LinkList
last

first b)
2 3 c)
a)
1 2 3

Node Node Node

70
Dynamic Data Structures

4.6 Removing an item from the list


int remove(LinkList& t)
// pre : the list is not empty()
// post: the first item in the list has been removed
{
int tempdata = t.first->data; // save the data in the node pointed to by
// first for return
Node* tempnode = t.first; // save the first node for deletion
t.first = t.first->link; // reset first to point to the next node after
// first
t.count--;
delete tempnode; // recover memory for the old node
return tempdata; // return the saved data
}

LinkList
last

first
c)
3 2 d)

a) 1 2 3

Node Node Node

b)
e)

tempnode

Heap

a) int tempdata = t.first -> data;

b) Node *tempnode = t.first;

c) t.first = t.first -> link;

d) t.count--;

e) delete tempnode

LinkList
last

first

2 3

Node Node

71
Dynamic Data Structures

4.7 Printing the list


void printlist( constLinkList& t )
{
Node* temp = t.first; // temp points to first node
while ( temp != 0 ) // while list not completely traversed
{
cout << temp-> data << endl; // output the data member of the node
// pointed to by temp
temp = temp-> link; // move pointer forward one node.
}
}
4.8 Searching the list
bool found( const LinkList& t, const int target )
// post: returns true if target is in the list, else false
{
if (empty( t ))
return false;
Node* temp = t.first;
do
{
if( target == temp->data )
return true; // return true if found
temp = temp-> link; // else move to next node
} while( temp != 0 ); // while not at end of list
return false; // not found
}

5. Other dynamic structures


The ability to create storage space for data dynamically at run time in response to the
requirements of the application and to link these data items together by means of a pointer
or pointers allows us to represent a wide range of structures of arbitrary complexity. Thus
we can model stacks, queues, priority queues, lists, ordered lists, lists of lists, trees, graphs
etc. The object-oriented features of the language that we shall be studying in the second
Semester enable us to design data types as classes of object that represent these data
structures. There are a number of books available that provide examples of these data
structures and the algorithms to process them.

72
Sorting

Sorting
1. Introduction
There are two main types of sorting - sorting arrays held in random access memory, and
sorting files. In the early period of computing, file sorting tended to be dominant because
RAM was very expensive and mass storage was held on magnetic tape, access to which is
sequential. In contrast, magnetic disk storage provides the possibility of accessing file
records by reference to their position in the file.

2. Components of Sorting
Sorting involves rearranging the elements so that they are in order. This, in turn consists of
two operations:-
! Comparing elements - usually by reference to a key field
! Moving elements - usually by swapping pairs of elements
There are normally many more comparisons than moves and the number of comparisons
will be the most significant operation in terms of time, and therefore the prime indicator of
the efficiency of a sorting algorithm.

3. Sorting Files
Database systems are now universal, and file sorting has become less important. Instead, a
number of different indexes are held - either within the data file, or as separate files - that
allow the data file to be read (and output) in different orderings.
If the amount of RAM permits it, and indexes are not supported, then the fastest way of
sorting a file is to read it into an array, sort the array and write the data back out to file. If
the file is too big, then it can be broken up into chunks, each of which is sorted in an array
and written out to a separate file. Then the several ordered files are merged back into a
single file.
The traditional file merge requires only 2 elements of the file to be in memory at any one
time and works as follows:-
! split the original file into two new files writing 1 item to each new file alternately.
Then merge back into the original file in pairs, creating n 2 runs of 2 items per run
! split the original file into 2 writing 2 items to each file alternately. Then merge back
into the original file in quadruples creating n 4 runs of 4 items per run
! split the original file into 2 writing 4 items to each file alternately. Then merge back
into the original file in octuples creating n 8 runs of 8 items per run
! etc.

The sort has finished when the original file contains 1 run of n items. The following is a
simplified example based on a file of 8 items. The principle is exactly the same for any
number of items.

73
Sorting

Pass Description Files

1 Original File 5 8 3 6 7 2 4 1

Split into 2 files consisting of 1 item from file 1 5 3 7 4


the original file written alternately file 2 8 6 2 1

Merge the two files by comparing 1 item from


each file and writing the smaller then the
larger into the original file giving 4 runs of 2 5 8, 3 6, 2 7, 1 4
items
2 Split into 2 files consisting of 2 items from the 5 8, 2 7
original alternately
3 6, 1 4
Merge the two files in groups of 2 items from
each file, giving 2 runs of 4 items
2.1 Run 1 5 3 3

5 6 3 5

8 6 3 5 6

only 1 item remaining from this run, write it 8 3 5 6 8,

2.2 Run 2 2 1 3 5 6 8, 1

2 4 3 5 6 8, 1 2

7 4 3 5 6 8, 1 2 4

only 1 item remaining from this run, write it 7 3 5 6 8, 1 2 4 7

3
Split into 2 files consisting of 4 items from 3 5 6 8
the original alternately 1 2 4 7

Merge the 2 files in groups of 4 items giving 1 2 3 4 5 6 7 8


1 run of 8 items. The file is now sorted

Note that:-
! There are only 2 elements from the file present in memory at any one time
! The process is dominated by I/O time
! The number of passes required to sort the original file is log2n

n Passes
8 3
64 6
512 9
4,096 12
32,768 15
262,144 18
2,097,152 21

74
Sorting

4. Why sort?

! Sorting is used to optimise searching for and retrieving data either by humans or by
the computer
! To produce a report which, because it is sorted, simplifies the manual retrieval of
information
! To make more efficient searches for items held in either main memory or external
storage

5. Does it pay to sort?


Sorting carries an overhead for
! time
! memory for the code
! memory for temporary data
For very small data amounts of data, sequential searching may be sufficiently fast to avoid
the need for sorting
But a simple sorting technique can be employed for low data volumes, needing little
overhead.

6. What is the best sort?


Different sorting techniques have different strengths and weaknesses depending on:-
! The number of items to be sorted
! Whether the items are:
! already ordered, or nearly so
! in random order
! already inversely ordered, or nearly so
! The amount of additional storage required:-
! Temporary
⇒ local variables
⇒ an explicit stack
⇒ additional space on the system stack for stack frames if a recursive
algorithm is used
! Permanent
⇒ for the code which implements the sort
! The number and size of data items required to be moved

7. Sorting efficiency
We are not usually concerned with the absolute amount of time required for a sort. But we
are concerned with how the time t taken for a sort varies with the number of items n
required to be sorted.
If there is a linear relationship, then t will vary directly with n. i.e. it will be O(n). But no
O(n) sort has yet been discovered!

75
Sorting
If t varies as a function of n2 then an increase in n by a factor of, say 10 will increase t 100
times and increasing n by 100 will increase t 10,000 times
The simple sorting algorithms are all O(n2)

8. Simple Array Sort - Exchange (Bubble)


Work through the array comparing adjacent pairs of elements.
If the first element is heavier (larger) than the second, swap them
Continue making passes, but stop one element sooner on each pass, because the next
heaviest element has bubbled down to its correct place

k=n
While k > 1 Do
For each element i from 1 to k - 1 Do
If element i > element i +1 then
Swap element i with element i + 1
Endif
EndFor
Decrement k
EndWhile

Pass 1 2 3 4 5 6 7
K 8 7 6 5 4 3 2
44 44 12 12 12 12 6 6
55 12 42 42 18 6 12 12
12 42 44 18 6 18 18 18
42 55 18 6 42 42 42 42
94 18 6 44 44 44 44 44
18 6 55 55 55 55 55 55
6 67 67 67 67 67 67 67
67 94 94 94 94 94 94 94

Notice that after each pass, the heaviest element in the unsorted part of the array has
settled to the bottom, increasing the sorted portion by one and decreasing the unsorted
portion by one. The indicators of the efficiency of this algorithm are:-
Comparisons = (n-1) + (n-2) ... + 1 = 28 = ½(n2 - n)
3
Max moves = /2 (n2 - n) = 84 max
3 2
Ave moves = /4(n - n) = 42 ave

This algorithm can be improved by employing a flag that is set when no exchanges take
place on a pass. In this case the array is sorted and no further passes are required. This is
an O(n2) algorithm. It is never used in real application because it is the least efficient of all
sorting algorithms. It is introduced here because it is relatively easy to understand and so
that you will know never to use it!

76
Sorting

9. Insertion Sort
This works in a similar way to the sorting of a hand of cards
Pick up the last but one element and place it in the correct order in the last 2
Pick up the last but 2 and place in the correct order in the last 3 etc.

If the number of items to be sorted > 1 then


For each element k from last item but one down to 0
j=k+1
save = k'th element
While j <= last item AND
the key of save > the key of the j'th element
r[j-1] = r[j];
increment j
endwhile
r[j-1] = save
endfor
endif

Pass 1 2 3 4 5 6 7
K 7 6 5 4 3 2 1
k'th key 6 18 94 42 12 55 44
44 44 44 44 44 44 44 6
55 55 55 55 55 55 6 12
12 12 12 12 12 6 12 18
42 42 42 42 6 12 18 42
94 94 94 6 18 18 42 44
18 18 6 18 42 42 55 55
6 6 18 67 67 67 67 67
67 67 67 94 94 94 94 94

Ave No. Comparisons = ¼(n2 + n - 2) = 14 (14 in the example)


Ave No Moves = ¼(n2 + 9n - 10) = 32 (29 in the example)
! On average, there are half as many comparisons as Exchange sort
! The algorithm is efficient if the data is already in order
! It is an O(n2) algorithm
! It is stable - equal keys are not moved. This can be important if 2 or more
consecutive sorts are required - each using a different key - the second being the tie
breaker when the first keys contain duplicates.

77
Sorting

10. Simple Sort performance

Selection Sort Insertion Sort Exchange Sort


(not covered in this note)
Moves Compares Moves Compares Moves Compares
Worst 3(n-1) ½n(n-1) ½n(n-1) ½n(n-1) 1.5n(n-1) ½n(n-1)
Average 3(n-1) ½n(n-1) ¼n(n-1) ¼n(n-1) 3/4n(n-1) ½n(n-1)
Best 3(n-1) ½n(n-1) 2(n-1) n-1 0.00 n-1

Simple Sorting Algorithms


Log Scale
10,000
Ordered
1,000
Random
100
Inverse
10

0
Insertion Selection Exchange

11. Conclusions
11.1 Insertion sort is better for small data items and large keys. It also gives good
performance when the data is already ordered (or nearly so). For this reason it is often
used in conjunction with advanced sorting algorithms, e.g. Quicksort

11.2 Exchange sort is the slowest sorting algorithm and is only used in teaching or trivial
applications because it is the simplest to code

11.3 Selection sort (not shown) is better for large data items with small keys. It has
shown slightly better performance than Insertion on inversely ordered data

12. Complex sorts


! Shell sort - derived from insertion sort
! Quicksort - See later
! Heapsort
! These are in a different class to the simple sorts. The number of comparisons tend to
vary in proportion to n.log2 n and they are therefore O(n.log n) sorts.

78
Sorting

13. QuickSort

This was invented by C.A.R. Hoare - a famous Oxford professor of computing and is an
advanced algorithm, based on the exchange sort, that normally employs recursion. It is the
most efficient of the advanced sorts although it becomes inefficient under certain very
exceptional conditions. The more data items, the less likely these conditions are to arise.
Insertion sort is often used in conjunction with Quicksort to sort small partitions.
The technique is to split the array into two partitions and then to sort the first partition
followed by the second partition:-

void QuickSort( AnyType array[] )


{
If sorting is needed then
split array into partitions S1 and S2
QuickSort(S1); QuickSort(S2);
EndIf
}
All the keys in partition S1 must be less than (or possibly equal to) each of the keys in
partition S2. The recursive routine sorts successively smaller and smaller partitions until a
partition contains only one item and is therefore sorted
The partitions are portions of the array itself - described by starting and ending indexes,
and not some additional temporary data structure.
Here is a refinement of the first description using four array index variables

void QuickSort( AnyType array[], int first, int last )


{
if( first < last )
{
split the array into 2 partitions
QuickSort( array, first, last_of_first_partition );
QuickSort( array, first_of_last_partition, last );
}
}

The 'partition' portion of the algorithm is where all the work is done. the second and third
statements are simply recursive calls to the function itself.
The partitioning process ensures that all items in the first partition have values that are <=
all items in the second partition - although neither partition is necessarily sorted.
One of the keys in the partition currently under consideration is selected as the pivot (the
central element in this example)
The items in the current partition are scanned
! first from left to right looking for an element >= pivot
! then from right to left looking for an element <= pivot
! when each scan has stopped, and provided the scan indexes have not crossed over,
the two items are swapped.

79
Sorting

Pivot

44 55 12 42 94 6 18 67

Scan Scan
Swap

18 55 12 42 94 6 44 67

Scan Scan
Swap

18 6 12 42 94 55 44 67

1st Partition 2nd Partition

Scanning continues until the 2 pointers cross over. The pivot is now in its correct position
in the array and is no longer involved in the partitioning. It may have been moved from its
original position.
Quicksort is called recursively to partition the lower and upper partitions, provided there
are at least 2 elements in them

14. Efficiency of Quicksort


14.1 Best Case
The pivot exactly divides the array into 2 equal partitions. There are then log2
partitions. There are n items, so the total number of comparisons is n.log2 n i.e.
O(n.logn )

14.2 Worst case


O(n2) - no better than Exchange sort. But this is extremely unlikely. The choice of
pivot is crucial - ideally, this should be the median key, but the true median can only
be found by sorting! Some variants choose the pivot by finding the median of 3
items randomly selected. The example below selects the central element as the
pivot.

14.3 Average
For all possible orderings of the keys 1.39n.log2n. Mathematicians can see the proof
in Algorithms - see para 17. below.

80
Sorting

15. C++ code for function Quicksort ( see Wirth )


void QuickSort( int array[], int first, int last )
{
int lb = first, ub = last; // lower bound and upper bound
int pivot = array[ (first + last) / 2 ]; // pivot = central element
int temp; // for the swap
do
{
while ( array[ lb ] < pivot ) // search up for item >= pivot
lb++;
while ( pivot < array[ ub ] ) // search down for item <= pivot
ub--;
if ( lb <= ub ) // if not crossed over, then swap
{
swap ( lb, ub ); // swap elements using their index
lb++; // increment ready for next scan
ub--; // decrement ready for next scan
}
} while ( lb <= ub ); // until indexes cross over
if ( first < ub ) // if > 1 item in the partition
QuickSort(array, first, ub); // partition the lower partition
if ( lb < last ) // if > 1 item in the partition
QuickSort(array, lb, last); // partition the upper partition
}

16. Comparison of complex sorting algorithms


16.1 Shell sort - a refinement on insertion sort proposed by D L Shell in 1959. The
analysis of this algorithm poses some difficult mathematical problems.

16.2 Heapsort - a refinement of selection sort. It seems to like sequences which are
initially in inverse order. The second fastest of the advanced sorts. Shell sort is
faster only if the data is already ordered.

16.3 Quicksort - is significantly faster than either of the above whatever the initial
ordering of the data.

500

T 400
i 300
Ordered
m
e 200
Random
100 Inverse

Shell Sort Heap Sort Quicksort

17. Further Reading

Algorithms + Data Structures = Programs, Wirth N, 1976, Prentice Hall


Classic Data Structures in C++, Budd Timothy A., 1994, Addison Wesley

81
Testing

Testing
1. The context for testing - Verification and Validation
Verification and Validation is a generic term for all processes which ensure that the
software meets its requirements, and that the specification meets the needs of the client. In
other words,
Verification means - Are we building the product right?
This involves checking that the software product conforms to its
specification

Validation means - Are we building the right product?


This involves checking to ensure that the software product meets
the expectations of the client

Techniques required
! Static - Analysis of the design and program listing.
Includes Walkthroughs, Inspections, Formal verification
! Dynamic - Exercising the program using test data similar to real data, i.e.
testing

2. The objectives of testing


! To show that the software system meets its specification.
! To exercise the system in such a way that any latent defects are exposed.

Testing cannot prove the absence of defects, only their presence. A successful test is one
that discovers defects.

Testing can never be exhaustive


Apart from trivial programs, the number of different
! possible inputs
! pathways through the program
are effectively infinite. For large programs, testing all possible combinations of pathways
through the code and all possible variations in categories of input would take until the end
of the universe even at the rate of one test per millisecond.

83
Testing

3. Testing & Debugging


! Testing is required to discover errors in software.
! Debugging is the process of correcting errors discovered by testing.

Locate Design Repair


Repair Error Re-Test
Error

It is much more economical to discover errors at the design stage than after the program
has been coded because this avoids the correction process i.e. it avoids the need to debug
and re-test.

4. Two different testing strategies


! Bottom-up
! Top-down

4.1 Bottom-up testing


As each component (e.g. function or module) is developed it is tested 'stand-alone'
by using a specially written 'test harness' or 'test driver'. This is referred to as unit
testing. In C++ a module is a file pair - the interface (header file) and the
implementation (object code file). Usually this pair will implement either:-
! A set of useful functions, e.g. iostream, math
! An abstract type, e.g. a linked list or string abstraction
Re-usable components (e.g. a linked list module) should be distributed with test
drivers.
Individual components e.g. functions are tested to ensure that they operate correctly.
Each component is treated as a stand-alone entity that does not need other
components in order for it to be tested.
Functions are assembled into modules that are then tested. - module testing.
Several modules may be amalgamated to produce sub-systems which are then tested
- sub-system testing. One of the problems that module or sub-system testing might
reveal is a mismatch between the interfaces. This can occur when the module using
the facilities of another module has been designed on assumptions that differ from
those made in the design of the module. This might result from a lack of
understanding of the interface specification on the part of either the author or the
user of the module. Or it might be caused by an error in implementation.

Sub-
Unit Module System Acceptance
System
Testing Testing Testing Testing
Testing

User
Component Testing Integration Testing Testing

Finally, all modules are combined to produce the program - system testing.

84
Testing
After this, the user carries out acceptance testing. For bespoke systems developed
for a single user, this is sometimes referred to as alpha testing. For marketable
software products beta testing may be used where a number of users agree to use
the system and to report on any problems. In exchange for this they may get the
software either free or at a preferential rate.
Advantages and Disadvantages of Bottom-up Testing
! Advantage
It is easier to create test conditions. The functionality is there - it just needs
code to test it.

! Disadvantages
" If combined with top-down development, all system components must
be available before testing can start because the last items to be
completed under this development strategy are the lowest level
components - the first to be tested.
" If top-down development is not employed, then special test drivers
have to be written for each component. Eventually these are replaced
by the actual higher level components when they are implemented.

4.2 Top-Down Testing


This starts with a skeleton of the system. An 'executive module' (at the top of the
hierarchy). Some or all of lower level modules may not have been implemented and
exist only as stubs. Stubs are functions whose body has not yet been implemented.
They simply report e.g. the name of the function or the value of the arguments
and/or return a dummy value.
Initially, the tests are very limited - the purpose is only to exercise the interfaces
between major sub-systems. As more and more modules are implemented the tests
can become more comprehensive.
Advantages-

! The testing process matches the top-down design approach.


! Structural errors - perhaps faults in the design are found earlier. This may avoid
extensive re-design at a later stage.
! The availability of a limited working system is a morale booster and may be
available to demonstrate to client.
Disadvantages
! It may be difficult to provide stubs which simulate the behaviour of a complex
component.
! In most systems, output is generated by lower level modules. There may
therefore be a need for an artificial environment to generate test results for
higher level modules.

4.3 Conclusion
The top-down approach is generally considered preferable for most systems today -
Yourdon. But, in practice, it will always be necessary to include a certain amount of
bottom up testing of low level components.

85
Testing

5. Categories of Testing
5.1 Functional testing
The most common form. Its purpose is to ensure that the program performs its
normal functions correctly - see above.

5.2 Thread testing


This may be used in real-time systems which are usually made up of a number of
co-operating processes. An external event such as an input from a sensor may cause
control to be transferred from the current process to the process that handles that
event. Real time systems are difficult to test because of the time-dependent
interactions between the processes. An error may occur only when the processes are
each in a particular state. Thread testing follows the functional testing of the
processes and is designed to trace the effect of the different external events as they
thread through the various processes. The number of combinations of state of the
various processes may be so great that it is impossible to test all of them, e.g. 10
processes, each with 10 possible states produces 10,000,000,000 different
combinations.

5.3 Recovery Testing


Purpose - to ensure that the system can recover from various types of failure.
This is important in on-line and real-time systems e.g. controlling manufacturing
processes.
It may be necessary to simulate in software such failures as hardware, power,
operating system etc.

5.4 Performance (Stress) Testing


Purpose - to ensure that the system can handle the specified volume of transactions
in terms of response time, storage requirements etc. This would be important in
large transaction processing applications such as airline reservation systems.

6. Test Planning
The planning of tests should be carried out during the Specification and Design phases of
the software project:-

Req'ments System System Detailed


Spec Spec Design Design

System Sub-system Module &


Acceptance
Integration integration Unit code
test plan test plan test plan test

System Sub-System
Acceptance
Service Integration Integration
test
test test

86
Testing

6.1 Test Plan & Test Log


The Test plan includes
! A unique identifying number for the test.
! A description of the purpose of the test.
! A specification of the data to be used.
! A description of the expected result.

The Test log includes


! A reference to a test plan item.
! The date of the test.
! The result of test.
! An indication of whether or not expected result was obtained.
! A reference to any corrective action required if a fault is found.
! A possible reference to re-testing if this is needed.

7. How much testing?


In theory, a program should be tested in such a way that all sets of pathways through it and
all possible combinations of input data are covered. In practice this is impossible for all
except very trivial programs because the number of combinations of input and pathway is
effectively infinite. However not every possible input may need to be tested. There is
probably a very large number of different inputs that will have the same effect. Thus, if a
function expects to receive an integer argument in the range 1..100, then all argument
values in this range should cause the function to behave correctly, and any outside of this
range should cause an error. It should not be necessary to test for every single valid
argument value, nor for every single invalid value. Instead, the range of argument values
can be partitioned into a number of equivalence classes (see para. 10).

8. Test Data v Test Cases


Test Data - The inputs devised to test the system
Test Cases - Input and Output specifications + a statement of the function under
test, the reason for the test and the expected result.
Test data can sometimes be generated automatically, but it is impossible to generate test
cases automatically.

9. Black box v White box testing


Black Box - Does not consider the code of a component. Test cases are derived
only from its specification and interface.
White box - Test cases are derived from a detailed study of the code of the
component to be tested.
These two methods are NOT alternatives. White box testing may be carried out early in
the testing process, while black box testing may be applied later. They are likely to
uncover different classes of error.

87
Testing

10. Black box testing


There are two techniques for deriving the test data -
! Equivalence Partitioning
! Boundary Value analysis

10.1 Equivalence Partitioning


This technique divides the input domain into a number of equivalence classes so
that a test on one representative value of each class is equivalent to a test using any
other value in that class.

Example
A function requires an argument Age which is an integer. The allowable range of
values for Age accepted by the function is 18..65.
From a study of the specification of the function or other program documentation
the following 3 equivalence classes can be identified:-
! Valid class any value in range 18..65
! Invalid class any value in range MIN(int)..17
! Invalid class any value in range 66..MAX(int)
Test cases can then be designed for each valid equivalence class and for each
invalid equivalence class - a total of 3 tests in this simple case.
If there is more than one argument, the test cases should cover the invalid classes
for only one argument at a time because one erroneous argument may mask the
effect of another erroneous argument.

Another Example - Binary Search function of an ordered array


bool binsearch( int array[], int numitems, int target, int& location )
/* Pre - The array is ordered, numitems >= 1, numitems <= no. of array elements
Post - If target is present in the array, then location records the element number
at which target was found and true is returned, else location records
the correct insertion point and false is returned */
{
int low = 0, high = numitems - 1, mid;
bool found = false;
do
{
mid = (low + high) / 2;
if( target > array[ mid ] )
low = mid + 1;
else
high = mid - 1;
} while( target != array[ mid ] && low <= high );
found = ( target == array[ mid ] )
if ( found )
location = mid;
else
location = low;
return found;
}

88
Testing

Valid Equivalence classes for input arguments:-


The choice of VECs may require experience, e.g. that the binary search of an
ordered array may, if not correctly coded, behave differently depending on whether
the number of items stored in the array is odd or even, or if there is only one item.
! Array
" has 1 item (numitems = 1)
" has even number of items (e.g numitems = 6)
" has odd number of items (e.g. numitems = 7)
! Target
" is present in the array
" is not present in the array

= 6 combinations of valid equivalence classes


Invalid Equivalence classes for input arguments:-
These are all cases where the pre-conditions are not met. The specification of the
binsearch function says nothing about how it will respond to such error conditions.
C++ provides the facility for an exception to be raised in such cases and for error
handlers implemented elsewhere in the code to catch the exception and take the
necessary action.
In a production program these invalid equivalence classes would be tested to ensure
that the exception and handling mechanisms dealt correctly with the various causes
of the error.
Black box testing on classes of output
It is necessary to test the outputs from the function in the same way as for inputs.
The same principles are applied as for input by specifying valid and invalid
equivalence classes for each output. Inputs are then devised that will produce these
defined outputs:-
! location
" valid 0..numitems
" invalid <0
" > numitems
! valid return values (there are no invalid return values)
" non-zero (true)
" zero (false)

= 2 combinations of valid, and two combinations of invalid equivalence classes.

89
Testing

10.2 Boundary Value Analysis

This complements equivalence partitioning and, in practice, is used at the same time
as equivalence partitioning to determine the test data required for testing a
component.
Boundary values are those
! directly on
! just below

! just above
the boundaries of the equivalence classes

It is an observed fact that a greater number of errors occur at the boundaries of the
input domain than in the centre.

Examples
! Range of values, e.g. 18..65
! Test 17,18,65 and 66
! Discrete set of values, e.g. 2, 3, 5, 8, 13
! Test 1, 2, 13, 14
! Data structure (e.g. array) has 1..100 elements
! Test 0, 1, 100, 101
! Loop iterations, none, 1, 2, max, max + 1

It is also necessary to identify the boundaries of the output equivalence classes.

Boundary Analysis of Search Procedure


Previously identified valid equivalence classes:-
! Array
" has 1 item (numitems = 1)
" has even number of items (numitems = e.g. 6)
" has odd number of items (numitems = e.g. 7)
! Target
" is present
" is not present
= 6 combinations of valid equivalence classes

Experience shows that programmers often make errors in an algorithm due to a


misunderstanding of its behaviour at the boundaries of its input domain. In the case
of the binary search algorithm, these errors might occur when the target (if present)
is located in the first element of the array, or in the last element. Obviously it is
necessary also to test the normal case when the target is in neither of these locations.

90
Testing

Thus the further test cases are added to those above:-


! Target is in first element of the array
! Target is in the last element of the array

! Target is in neither the first nor the last element


When the equivalence classes already developed are combined with these boundary
values, the following 10 test cases arise:-
! numitems = 1, target is present
! numitems = 1, target is not present
! numitems is even, target is in the first element
! numitems is even, target is in the last element
! numitems is even, target is present and in neither the first nor the last element
! numitems is even, target is not present
! numitems is odd, target is in the first element
! numitems is odd, target is in the last element
! numitems is odd, target is present and in neither the first nor the last element
! numitems is odd, target is not present.

11. White box testing - Introduction


Test data is derived from the actual source code of the component instead of from its
specification. Ideally tests should exercise all possible sets of paths through the code

A B C Branching Decision

Statement Block

Loop twice

How many different sets of paths exist for this simple piece of code?

1 2 3 4 5 6 7 8 9

First iteration A A A B B B C C C
Second iteration A B C B A C C A B

91
Testing

The answer is 9, i.e. 3 paths raised to the power number of


loop iterations.
And this?
The answer is 95,367,431,640,625 = 520 different sets of
paths.
Evaluating every possible set of paths at 1 test/millisecond
would take 3,022 years. So exhaustive testing is not possible.
In practice, tests should guarantee that
! Each path (not necessarily all sets of paths) has been
exercised.
loop 20 times
! All logical branches have both values tested (true and
false).
! All loops are exercised at their boundaries and within
their bounds.
! All internal data structures have been exercised to ensure their validity.

But why do we need to go to all this trouble? Wouldn't we spend our time better simply
ensuring that the function/module/program requirements have been met? In other words
why don't we confine our tests to black box testing?

Because
! Logic errors and incorrect assumptions tend to occur in inverse proportion to the
probability that a path will be executed.
Normal processing tends to be well understood and scrutinised, but special cases
tend to fall down the cracks.
! We often believe that a path is unlikely to be executed when, in fact, it may be
executed regularly.
! Typing errors are usually picked up by the compiler. But those that are not detected
are just as likely to occur on an obscure logical path as on a mainstream path.

12. White box testing


12.1 Techniques
! Statement coverage
! Condition coverage
" Branch testing
" Domain testing
! Loop coverage

12.2 Statement coverage


Every statement should be executed at least once. See Sommerville Ch 22.2.1 on
path testing & cyclomatic complexity. also Pressman Ch 18.2..18.4

92
Testing

12.3 Path testing


A technique for finding the number of unique paths through a program thus
providing the number of test cases.
Uses flow graphs derived from the program code or from the PDL (program
description language) for the routine + metrics for calculating the cyclomatic
complexity.

12.4 Path testing


Flow Graph Constructs

Sequence While Case

If

Repeat

12.5 Cyclomatic Complexity


The cyclomatic complexity is a measure of do
the logical complexity of the code. A flow
graph is drawn from the flow chart of the
component. The C.C. may be calculated in
any one of three ways (see flow graph on mid = (low + high) / 2;
next page).
! Number of regions (including the one if( target > array[ mid ] )
outside the graph)
! Number of edges - number of nodes + low = mid + 1;
else
high = mid - 1;
2
! Number of predicate nodes + 1.
(Predicates are simple 2 branch while( target != array[ mid ]
constructs. Each diamond in the flow
&& low <= high );
chart opposite is a predicate).

Each of these three methods produces the if ( found )


same cyclomatic complexity metric (i.e.
the number of independent paths through
else
the code). In this example = 5 location = mid;
location = low;

The number of independent paths also


provides the number of different test cases
required to ensure that all statements are return found;

exercised.
Flow chart for binary search

93
Testing

12.6 Condition Testing mid = (low + high) / 2;


Conditions are made up of:-
! Arithmetic & character if( target > array[ mid ] )
expressions involving
arithmetic and character
R2
variables and constants
else
low = mid + 1;
! Relational expressions - logical high = mid - 1;
R1
expressions involving
arithmetic and character
while( target != array[ mid ]
expressions and relational
operators. They have the value
&& low <= high );
of either TRUE or FALSE.
R3 R5
! Boolean variables.- Values
Non-zero (TRUE), zero
(FALSE).
if ( found )
! Boolean operators (&&, ||, !)
joining one or more logical
expressions. R4
Number of edges = 14 else
location = mid;
Number of nodes = 11 location = low;
! Parentheses surrounding simple
Number of regions = 5
or compound conditions
Number of predicates = 4
return found;
Condition testing
Focuses on testing each condition
in the component (including each Flow graph for binary search
of the simple conditions making up
a compound condition).

Condition testing strategies


! Branch testing
! Domain testing
The advantages of condition testing are i) it is easy to generate test cases and ii) it is
likely to reveal other errors in the program.

12.7 Branch testing


Test data is constructed so that the TRUE and FALSE branches of compound
conditions and the TRUE and FALSE values of every simple condition within the
compound conditions are tested.
To find all possible combinations of the TRUE and FALSE branches of all
conditions, it is necessary to construct a truth table.

94
Testing

Example
if ( A > 1 && B == 0 )
X /= A;

A>1 B == 0 A > 1 && B == 0


TRUE / FALSE Value TRUE / FALSE Value TRUE / FALSE
T 3 T 0 T
T 3 F 1 F
F 1 T 0 F
F 1 F 1 F

For the above 2 conditions there are 4 test cases i.e. 22. For 3 conditions, there are 23
= 8 possible combinations etc. This technique is therefore only practicable for small
numbers of conditions.

12.8 Domain testing


Domain testing of relational expressions requires that 3 values be considered for
each variable component of a relational expression : less than, equal to and greater
than. For the above example, this gives rise to the following test cases for each
variable A and B.

= > <
A == 1 1 A>1 2 A<1 0
B == 0 0 B>0 1 B<0 -1

There are therefore 3 test cases for each of the two variables in the example
compound condition, leading to 32 = 9 test cases. Again, the number of test cases
rises rapidly as the number of variables involved in a relational expression
increases.

12.9 Loop coverage


The vast majority of algorithms in
software employ loops. Loop testing
focuses entirely on the validity of loops
which are classified as follows
! Simple loops
! Nested loops
! Concatenated loops

Simple loops Nested loops Concatenated loops

95
Testing

Simple loops
The following tests should be applied to simple loops, where n is the maximum
number of allowable iterations of the loop:-
! Skip (loop is not entered)
! One pass
! 2 passes
! m passes (m < n)
! n - 1, n, n + 1 passes

Nested loops
The number of times that statements within the inner loop are executed is the
product of the number of iterations of all nested loops within which it appears. Thus
a triply nested loop, where each loop iterates 10 times, will cause statements in the
inner loop to be executed 1,000 times. The number of test cases grows
geometrically and full testing may be impracticable. The suggested solution is:-
a) Start with the innermost loop, setting all outer loop control variables to their
minimum.
b) Test the inner loop as Simple above.
c) Work outwards to next innermost etc. keeping outer loop control variables at
their minimums, and the inner at typical values.
d) Continue until all nested loops have been tested.
Concatenated loops
Where the concatenated loops are independent of each other, treat each as a simple
loop.
Where the second loop has the same control variable as the first and starts with its
value unchanged, treat the two loops as nested.

13. Automated Testing


Testing often accounts for as much as 40% of the total time spent on software
development. Automated testing tools are therefore an important ingredient in the software
developer's armoury. The following categories have been identified:-
! Static Analysers
! Carry out a static analysis of the program's structure and format.
! Code auditors
! Special purpose filters that check the quality of software to ensure it meets
minimum coding standards.
! Assertion processors

96
Testing

! The programmer writes assertions about the state of program. The assertion
processor tests whether they are true or false. C incorporates a simple form of
assertion testing:-
#include <assert.h>
int main ( void )
{
int i = 0;
for( ; i <= 10; i++ );
assert( i == 10 );
return 0;
}
/* Assertion failed: i == 10, file ASSERT.CPP, line 7
Abnormal program termination */

C++ provides exception handling which gives greater flexibility and permits an
exception handler to attempt recovery from an error.
! Test file & Test data generators
! Test verifiers - measure and report on internal test coverage
! Test harnesses - Allow the program to be installed in a test environment, and fed
input data. The behaviour of subordinate modules is simulated by stubs.
! Output comparators - compare output from the current version of program with that
from an earlier version to determine any differences
This is an area of growing importance and descendants of the first generation testing
tools are expected to cause radical changes in the way software is tested.

97
Data Structure Metrics

Data Structure Metrics


1. Representing Abstract Structure

Assume we wish to store a linear list of names in random access memory. There are
several ways this could be done.

Scheme 1
Names are stored in successive memory locations (each name is assumed Address Name
to occupy only 8 bytes). 1000 Milton
1008 Dickens
Given the start address of the list (1000), we can find the ith name by
1016 Eliot
going to address Start + (i - 1) * 8.
1024 Arnold
1032 Conrad
We can find the address of the next name by adding 8 to the address of Scheme 1
the current element.
Thus, Scheme 1 implements the logical structure of the data by locating its elements in
physically adjacent memory locations.
But if we wish to retrieve a name (in order to access some other data associated with it),
then we would have to scan the list from the start, looking for the name to be retrieved.

Scheme 2
Address Name
Each name is positioned in memory according to the value of its first 1000 Arnold
letter. The address for a particular name is found by 1008 -
1000 + 8 * (int(firstletter) - int(`A')) 1016 Conrad
In this case there is no way of finding the logical successor of a record. 1024 Dickens
We are prevented from operating on the data using its logical structure. 1032 Eliot
But if we wished to retrieve a particular name, we could do so very .. ..
quickly by calculating the address directly from the name. 1096 Milton

Scheme 3 Scheme 2

Each element contains both a name and an address pointing to the


element's logical successor. Given the address of any element, we Address Name Successor
can find its successor by simply going to the address contained in Address
that element. 992 1024
1000 Milton 0
Scheme 3 implements the logical order by linking the elements
1008 Dickens 1016
together in the proper sequence which is not the same as the
1016 Eliot 1000
physical sequence. Address 992 is used to hold the address of the
1024 Arnold 1032
first name in the list. Milton has a blank successor address field
1032 Conrad 1008
indicating that this is the last name in the list. Scheme 3

As with Scheme 1 we cannot find a given name other than by starting at the beginning of
the list and comparing each successive name with the target. These three schemes illustrate
the three fundamental methods of implementing abstract list data types - by an array, a
hash table and a linked list.

99
Data Structure Metrics

2. Implementing Data Structures


The implementor of a data structure must design the black box so that memory space is not
wasted and the operations are performed efficiently.
If the user knows in advance how many data elements the structure is required to handle,
then certain efficiencies can be gained. If not, then the structure must be made flexible in
order to accommodate an unknown number of items, or considerable space may be wasted.
If the length of the elements is fixed and known, again efficiencies can be obtained
compared with the case where elements are of unknown length.
The implementor can make certain operations efficient at the expense of others, and he
will need to know for which operations maximum efficiency is important to the user.
There is almost always a trade-off available between space and time. Greater speed can be
obtained at the expense of more memory space and, conversely, a saving in space will
usually incur a time penalty. Which is the more important to the user, space or time?
All of these considerations must be taken into account in the implementation of a data
structure.

3. Metrics
One way of implementing a list is to use an array. It is true that arrays are relatively
unsuitable for this purpose because of their inflexibility and because of the need to shuffle
array elements down to fill the hole left by a deletion, but they have the advantage of
requiring no overhead in terms of space. Linked lists, of course, carry an overhead in the
form of the links (pointers) that connect the nodes.
Envisage then a list implemented as an array as in Scheme 1 above and assume that we
wish to find the name Eliot in the list.
3.1 Number of Comparisons
We simply start at the first name in the list (Milton) and search through the list,
comparing each name encountered with Eliot. One measure of the time required to
find this name is the number of comparisons made of each name with the target.
Unless the list is very short, the time required to initialise and finalise the search
will be relatively unimportant when set against the number of comparisons. It is
generally true that the number of comparisons made when searching a data structure
will be one of the major factors in determining the speed of execution.

3.2 Number of Data Moves


This is the second most important operation determining the efficiency of
operations on a data structure. Suppose we wished to remove the name Eliot from
the list and did not wish to leave a gap. We could move each name below Eliot up
one location, thus reducing the length of the list by one element. No comparisons
are required for this operations, but many data moves. The speed of deletion will
therefore be governed by the number of moves.

100
Data Structure Metrics

3.3 Algorithm Complexity


The measurement of the complexity of an algorithm is important because of the
effort required to
! implement the algorithm
! understand it
! debug it
! modify it
! maintain it

4. Mathematical Notations
One way of ascertaining the efficiency of algorithms used in operations on data structures
is to write a program which tests the algorithm on a large number of different types and
sizes of data. This approach is useful in trying to understand an algorithm and the factors
which affect its efficiency, but the problem is that:-
a) The data would only be valid for the computer, operating system and language we
have employed and the nature of the data stored in the data structure.
b) We could not possibly examine exhaustively all possible combinations of data
(there are over 358,000 different combinations of just four characters, ignoring
case).
c) We would finish up with a mass of results which would be difficult to understand
and distil into a general indication of the efficiency of the algorithm under
consideration.
We require a crude indicator of the time complexity of an algorithm that relates the time
taken to the number of elements held in the data structure. We are not particularly
concerned with the absolute amount of time, which, for one algorithm, will depend on the
factors mentioned in a) above.
Looking at the search example above, how many comparisons, on average will be required
to find a name in the list? Let n denote the number of names in the list:-

Element Number Number of


Comparisons
1 1
2 2
3 3
.. ..
.. ..
n n

To find the average number of comparisons necessary to locate a name present in the list,
we first find the total required to find each of the names, and then divide by n. Thus, n
comparisons would be needed to find the last name, n - 1 to find the last but one ... through
to just one comparison to find the first. We can calculate the average number of
comparisons for n items without needing to know the value of n:-

101
Data Structure Metrics

Total number of comparisons = n + (n-1) + (n-2) + .. 1


Reverse 1 + 2 + 3 + .. n
Add (n+1) + (n+1) + (n+1) + .. (n+1)

Since there are n items in the sequence, the total of the third row is n(n + 1). To find the
average number of comparisons, we need to divide by n and also by 2 since we added the
2 sequences together.
Divide by 2n to find the average for any one name n(n+1) ie ½(n + 1)
2n
Thus the average number of comparisons required to find a name in the list is about half n
whatever the value of n. Since we have seen that the number of comparisons is a major
determinant of the time required, we can say that the time taken for this search is
proportional to ½(n + 1). Since the constant ½ is not significant in relation to other
possible factors of n, we can say that the order of magnitude of the efficiency of the search
is n, and we write this as O(n). This is sometimes referred to as the Big O notation.
Only the dominant term is chosen to represent a crude notion of the order of magnitude of
the entire expression, eg

n(n+1) is O(n2)
15n logn + 0.1n2 + 5 is O(n2)
6 logn + 3n + 7 is O(1)
2n - 5

Why is the second item above classified as O(n2) when this appears to form only a small
part of the expression? Table 1 shows the value of this function for various values of n.
The last column shows the value of the expression divided by 0.1n2. Note that from n =
512, the value in this last column starts to settle down to about 1.0 indicating the
overwhelming importance of the 0.1n2 component.

2 2 2
n 15n log2n 0.1n 15n.log2n+0.1n +5 / 0.1n

8 120 3 6 371 58.03


32 480 5 102 2,507 24.49
128 1,920 7 1,638 15,083 9.21
512 7,680 9 26,214 95,339 3.64
4,096 61,440 12 1,677,722 2,415,007 1.44
65,536 983,040 16 429,496,730 445,225,375 1.04
1,048,576 15,728,640 20 109,951,162,778 110,265,735,583 1.00
2
TABLE 1 Value of expression 15n logn + 0.1n + 5 for various values of n

Table 2 gives some idea of the values of several different functions of n.


Some simple sorting methods (e.g. Exchange or Bubble sort) operate in a time which is
O(n2) whereas other, more complex algorithms (e.g. Shell sort and Quicksort), operate in a
time which is O(nlog2n). If there were 1024 items to sort, the simple method would take
approx 1,000,000 units of time. Compare this with a time of only approx 10,000 for the
O(nlog2n) sort. However, it is not true to say that the complex sort is 100 times faster than
the simple sort.
Because of its complexity, the more powerful O(nlog2n) sort will carry an overhead which
results in constants which are present in the true value of the function but which are
ignored in arriving at the crude order of magnitude value. For this reason also, the
complex sort may not be as fast as the simple sort for small values of n.
102
Data Structure Metrics

n 8 128 1,024 1,048,576


O(n) 8 128 1,024 1,048,576
2
O(n ) 64 16,192 1,048,576 1,099,511,629,000

O(n½) 3 11 32 1,024
O(log2n) 3 7 10 20
O(n.log2n) 24 896 10,240 20,971,520

TABLE 2 Values for various functions of n

In some sources you may find logarithms specified without the base, eg O(nlogn). Does it
matter which logarithm base in used in these order of magnitude expressions? The answer
is no, because, although the absolute values of the expressions will differ according to the
base used, the rate of increase of the function for increasing values of n will remain the
same for all logarithm bases.
Table 3 illustrates this by showing the values of the expression O(nlogn) for logarithms
base 2, e and 10 and for values of n which double in each row. Note that the rate of
increase is exactly the same for all three bases, and is approximately 2.2 times for each
doubling of n.

n n.log2n rate of n.ln n rate of n.log10n rate of


increase increase increase
128 896 621 270
256 2,048 2.29 1,420 2.29 617 2.29
512 4,639 2.27 3,216 2.27 1,397 2.27
1,024 10,240 2.21 7,098 2.21 3,083 2.21
2,048 22,528 2.20 15,615 2.20 6,782 2.20
4,096 49,152 2.18 34,070 2.18 14,796 2.18
8,192 106,496 2.17 73,817 2.17 32,058 2.17
16,384 229,377 2.15 158,991 2.15 69,049 2.15

TABLE 3 Values for various logarithm bases

103
Trees

Trees
1. Applications
! Trees are hierarchical structures and can be used in any application that models a
hierarchical structure, e.g. disk directory and file structure.
! In some forms they can provide rapid searching and lookup
! They can maintain their data ordered (usually on a unique key that is associated
with their data)

2. Implementation
Trees cannot normally be based on a fixed size structure such as an array. They are
normally implemented using dynamically allocated nodes linked by pointers.

3. Variations
! Binary Search trees
! Expression Trees
! Balanced Trees
! N'ary Trees
! B Trees

4. Example Declaration
struct DataItem
{
int key; // key to search on
anytype value; // depends on the application
};
struct Node
{
DataItem data; // struct as above
Node* left, *right; // pointers to left and right child nodes
};

struct BinaryTree
{
int count; // number of nodes
Node* root; // single entry point into the tree
};

105
Trees

5. Expression Trees
#
Assume the expression ( 3 + 4 ) * ( 6 - 4 ) is to be
evaluated. Parsing and evaluating an infix expression
of this sort in a single pass is very difficult because
the string has to be searched back and forth to + -
recognise and allow for the modifying effect that the
parentheses have on the meaning of the expression.
A tree of nodes representing operators ( +, -, *, / )
and values (or variables) can be built to represent the 3 4 6 4
semantics of the expression without the parentheses.
The tree can then be traversed to retrieve the
symbols and values in an appropriate order for
evaluation - see Traversal below.

6. Tree Traversal
There are several possible ways in which the tree can be traversed, the most common are
known as inorder, postorder and preorder:-

Inorder <left tree> Node <right tree> (3 + 4) * (6 - 4)


PostOrder <left tree> <right tree> Node 34+64-*
PreOrder Node <left tree> <right tree> *+34-64

The post order traversal would produce the nodes in an order suitable for evaluating the
resultant postfix expression using a stack.
The algorithm for binary tree traversal is one of the most elegant in computer science. It is
recursive:-
void inorderTraverse( Node* p )
{
if ( p != 0 )
{
inorderTraverse( p->left );
Process( p-> data );
inorderTraverse( p->right );
}
}

Process( p-> data ) is the operation that is to be carried out on each node. Note that this
algorithm effectively maintains its own stack of nodes visited but not yet processed. This
is represented by the series of stack frames that is pushed onto the system stack for each
call to the function. A non-recursive version of this algorithm requires an explicit stack of
nodes to be maintained and is quite inelegant when compared to the above.

106
Trees

7. Parse Trees
Sentence = Subject Verb Object Sentence
Subject = Noun | Noun Phrase
Object = Noun | Noun Phrase
Subject Verb Object
Noun = Cat | Mat | Dog OR OR

Verb = sat | ate | chased


Noun Noun Noun Noun
Phrase Phrase

Parse trees such as the above very simple


example above may be used in natural language recognition and language translation
software.

8. Binary Search Trees


The (recursive) definition of a binary tree is:-
A binary tree is
! either empty
! or consists of a node with left and right binary trees
Binary search trees are ordered on a unique key field. The first data item to arrive causes a
new node to be allocated which becomes the root node. Access to the tree is always via the
root. For subsequent additions, the tree is traversed, looking for an empty left or right child
node starting at the root. If the key of the data to be added is less than that of the current
node, then the left child of the current node is visited. If the data to be inserted is greater
than that of the current node, then the right child is visited. If the two data values are
equal, then the data cannot be added since binary trees rely on the keys being unique.
Eventually, an empty left link or right link is encountered. A new node is allocated and
linked in to the tree as the left, or right child of the node currently being visited. All
additions therefore take place at the lower levels of the tree - as leaf nodes.
8
! Searching for 6
Left, Right, found
! Searching for 11
4 12
Right, Left, Right, not found
! Inserting 13
Right, Right, Left, not found so Insert as Left child 2 6 10 14
of 14

Level
The total number of nodes in a perfectly balanced binary search tree is 2 -1. Thus, for
20 levels, the total number of nodes would be 1,048,575.
The efficiency of a perfectly balanced tree is measured by the average number of
comparisons required to find a key that is present in the tree. Since it requires one
comparison to visit the root node, two comparisons to examine the root node and one of its
child nodes etc. the maximum number of comparisons is the number of levels and, since
the number of nodes doubles at each level, the average number of comparisons for a
perfectly balanced tree is the number of levels - 1. Thus for a perfectly balanced tree of
1,048,000 nodes, the average number of comparisons is Number of Levels - 1 = 19.
107
Trees
This makes binary search trees a suitable structure for fast retrieval of data by reference to
a key and, for this reason, the C++ Standard Template Library uses balanced binary search
trees to implement searchable structures such as map and set.

9. Importance of Balance
This tree was generated by inserting the data in 2
numeric order - 2, 4, 6 .. 16. If, as in this case,
4
the tree is not balanced, search efficiency
degrades towards a simple sequential search, 6
i.e. from an average number of comparisons = 8
Level - 1 to 10
½(n + 1). There is little difference between the
12
two in this small example but, for large
numbers of items, the difference in searching 14
efficiency is extremely large. 16
Degenerate Binary Search Tree
AVL Trees (from Adelson-Velskii & Landis)
employ a balancing algorithm on every insertion and deletion which ensures that the tree
maintains an adequate (although not perfect) balance. Another algorithm is red/black trees
that are used in the Standard Template Library.

10. Other types of tree


An important tree structure is the B Tree (not to be confused with binary tree). They are
used extensively in database software for file indexing, i.e. the storage (possibly in a
different file) of pairs consisting of a key and a record number at which the data associated
with the key may be found in a data file. They differ from binary trees in that each node
contains not one key, but an array of ordered keys and have the attribute that they are
always balanced and that new nodes are created by splitting the root node in two.
Arrays can be searched efficiently by a binary search in which the searched-for key is
compared with the middle element of the array. If it is smaller, then the 'top' half of the
array can be discarded, and a binary search carried out on the lower half. The converse
applies, of course, where the key is larger than the middle element of the array. The
efficiency of a binary search is the same as that of a binary tree. Each comparison is
halving the number of items which remain to be searched.

108
Trees

No. of comparisons for


Nodes in 2^ level Total Nodes level = Total Ave comps
Level Level (2^level - 1) level * nodes in level comps per node
1 1 2 1 1 1 1.000
2 2 4 3 4 5 1.667
3 4 8 7 12 17 2.429
4 8 16 15 32 49 3.267
5 16 32 31 80 129 4.161
6 32 64 63 192 321 5.095
7 64 128 127 448 769 6.055
8 128 256 255 1,024 1,793 7.031
9 256 512 511 2,304 4,097 8.018
10 512 1,024 1,023 5,120 9,217 9.010
11 1,024 2,048 2,047 11,264 20,481 10.005
12 2,048 4,096 4,095 24,576 45,057 11.003
13 4,096 8,192 8,191 53,248 98,305 12.002
14 8,192 16,384 16,383 114,688 212,993 13.001
15 16,384 32,768 32,767 245,760 458,753 14.000
16 32,768 65,536 65,535 524,288 983,041 15.000
17 65,536 131,072 131,071 1,114,112 2,097,153 16.000
18 131,072 262,144 262,143 2,359,296 4,456,449 17.000
19 262,144 524,288 524,287 4,980,736 9,437,185 18.000
20 524,288 1,048,576 1,048,575 10,485,760 19,922,945 19.000

Table 1 Metrics for Binary Trees

109
Hash Tables

Hash Tables
1. Applications
! Compilers (see later under perfect hashing functions)
! Basis for other Abstract Data Types, e.g. Set, Dictionary
! Very efficient retrieval

2. Operations
! Insert
! Remove
! Find (Lookup)

3. Efficiency
The measure of efficiency of searching and sorting is given using the big O notation (see
Data Structure Metrics on page 99). This is a very crude measure of the relationship
between time and the number of items being dealt with. The important factor is the rate at
which time increases as the number of items increases. Hash tables are unique among data
structures in that their efficiency is not dependent on the number of items stored and their
efficiency is therefore given as O(1).

4. Problem
The penalty paid for this exceptional measure of efficiency is that hashing destroys the
lexical order of keys, so that they cannot subsequently be retrieved in their lexical order.

5. Hashing
Data is stored in a Hash Table that is based on the fundamental array structure provided by
the language. The size of the table is always a prime number. Insertion (and searching) is
performed by applying some function to the key which converts it into an integer in the
range 0 .. table_size -1. The modulus operation is used to achieve wrap-around. In this
example the column headed ASC represents the sum of the ASCII codes of the first 3
characters of the name. This is then taken modulo 11 (the table size) to produce the table
index. The insertion of the first three items is
shown in the hash table (second of the two Name Key ASC Table Index
tables). The fourth key BYR produces the same SHELLEY SHE 224 4
index as that of WORDSWORTH - a collision. WORDSWORTH WOR 248 6
This is not surprising since we are trying to KEATS KEA 209 0
insert a very large domain of values into a table
BYRON BYR 237 6
with only 11 locations.
BLAKE BLA 207 9
BETJEMAN BET 219 10

111
Hash Tables

6. Collision Resolution
There are two strategies for resolving collisions:-
! Open Addressing Key Data
A second hashing function is used to give a 0 KEA KEATS
new table location and a further attempt is 1
made to enter the key into the table. The 2
simplest function to produce a new location 3
after a collision is to successively add 1 to the 4 SHE SHELLEY
5
result of hashing the key. But this can cause
6 WOR WORDSWORTH
clustering where the relative density of certain 7
areas of the table is higher than average. This 8
can give rise to a higher than necessary 9
number of collisions. An improved second 10
hashing function is:-
hashvalue = hashvalue + step
where step = hashvalue % ( table size - 2) + 1
step is computed only once before the loop is entered.
Probing continues until an empty slot is found or, after a certain number of tries, the
table is deemed to be full.
! Chaining 0 KEA KEATS
1
The Table entry contains a data 2
entry and a pointer to the head of a 3
list of data items that collided with 4 SHE SHELLEY
the first or, more simply, just a 5
6 WOR WORDSWORTH BYR BYRON
pointer to the head of a list.
7

7. Hash Table example


This is a simple skeleton for a hash table that holds a String pair - a key and its associated
string data. Several functions are not shown, e.g. resize, search. The search function
closely matches the add function except that resizing is not needed.

#include "strng.h"
#include <assert.h>
struct Item // component type of the table
{
String Key, Data;
bool occupied;
};
const TABLESIZE = 167; // 167 is prime
Item tabl[TABLESIZE]; // Hash table is an array of Item
int itemcount; // number of items stored
void init(void )
{
for ( int i = 0; i < TABLESIZE; i++ )
tabl[i].occupied = false;
theSize = TABLESIZE;
itemcount = 0;

112
Hash Tables
}
void add( const String& key, const String& data )
{
// for best efficiency, the number of occupied slots should be <=
// 80% of table size
if ( itemcount > theSize * 8 / 10 )
{ resize( ); }
int hash = key.hashvalue(); // key must support a hashvalue function
int step = hash % (theSize - 2) + 1; // step size for collision resolution
hash %= theSize; // hash mod table size
int numprobes = 1; // to count the number of probes
// look for an unoccupied slot
bool foundslot = ( !tbl[hash].occupied );
// loop not entered if unoccupied slot found first time
while( !foundslot && (numprobes < theSize) ) // second cond is belt & braces
{
hash = ( hash + step ) % theSize;
foundslot = ( !tbl[hash].occupied );
numprobes++;
}
assert( foundslot ); // should always be true
tbl[hash].Key = key; // store the key
tbl[hash].Data = data; // and the associated data
tbl[hash].occupied = true; // slot is now occupied
itemcount++; // increment count of items
}

8. Perfect Hashing Functions


The special properties of hash tables have led to extensive research to exploit their
efficiency. One such area is the speed at which compilers can parse the source code of a
program. If a function can be found that is guaranteed to find a unique location in a fixed
size hash table for all the reserved words of a programming language, then a significant
speed improvement could be gained. It is not easy to find such a function other than
empirically. But it may be worth a considerable amount of effort to find it bearing in mind
the commercial advantage to be obtained for a fast compiler.
A perfect hashing function for Pascal reserved words that does not result in any collisions
is:-

H(key) = L + g(key[1]) + g(key[L])


where L = the length of the reserved word and g = a function associating a letter with an
integer. This gives the fastest retrieval possible

113
Hash Tables

[2] do [11] while [20] record [29] array

[3] end [12] const [21] packed [30] if

[4] else [13] div [22] not [31] nil

[5] case [14] and [23] then [32] for

[6] downto [15] set [24] procedure [33] begin

[7] goto [16] or [25] with [34] until

[8] to [17] of [26] repeat [35] label

[9] otherwise [18] mod [27] var [36] function

[10] type [19] file [28] in [37] program

114
Libraries

Libraries
1. The ctype library
This is a 'C' library of functions that operate on characters. They include functions to test
whether a char is a letter, a digit, punctuation etc. and also to carry out case conversion.
The functions available from ctype.h are:-

int isalnum(int c);


int isalpha(int c);
int isascii(int c);
int toascii(int c);
int iscntrl(int c);
int isdigit(int c);
int isgraph(int c);
int islower(int c);
int isprint(int c);
int ispunct(int c);
int isspace(int c);
int isupper(int c);
int isxdigit(int c);
int tolower(int c);
int toupper(int c);

The use of int instead of char in the return and argument types is historical. For the is..
functions, the return type can be understood to be boolean, In all cases the argument type
can be read as type char.
Help on each on these functions is provided from the RHIDE menu Help.libc reference.
functional categories.ctype.

115
Libraries

2. The maths library


These are to be found in math.h. To use them you need to
#include <cmath> or
#include <math.h>
The functions and constants to be found are:
double acos(double x);
double asin(double x);
double atan(double x);
double atan2(double y, double x);
double ceil(double x);
double cos(double x);
double cosh(double x);
double exp(double x);
double fabs(double x);
double floor(double x);
double fmod(double x, double y);
double frexp(double x, int *pexp);
double ldexp(double x, int _exp);
double log(double y);
double log10(double x);
double modf(double x, double *pint);
double pow(double x, double y);
double sin(double x);
double sinh(double x);
double sqrt(double x);
double tan(double x);
double tanh(double x);
double acosh(double a);
double asinh(double a);
double atanh(double a);
double hypot(double x, double y);
double log2(double x);
long double modfl(long double x, long double *pint);
double pow10(double x);
double pow2(double x);
#define M_E 2.7182818284590452354
#define M_LOG2E 1.4426950408889634074
#define M_LOG10E 0.43429448190325182765
#define M_LN2 0.69314718055994530942
#define M_LN10 2.30258509299404568402
#define M_PI 3.14159265358979323846
#define M_PI_2 1.57079632679489661923
#define M_PI_4 0.78539816339744830962
#define M_1_PI 0.31830988618379067154
#define M_2_PI 0.63661977236758134308
#define M_2_SQRTPI 1.12837916709551257390
#define M_SQRT2 1.41421356237309504880
#define M_SQRT1_2 0.70710678118654752440
#define PI M_PI
#define PI2 M_PI_2

The usage of any of these functions can be found by running the info program from the
DOS command line. Move the cursor to
* libc.a: (libc.inf). The Standard C Library Reference
press Enter and choose menu options Functional Categories and math functions.
press Q to exit the info program

116
Libraries

3. The standard library


This requires the inclusion of cstdlib or stdlib.h. It is a miscellaneous collection of
functions for such operations as converting strings to numeric types, sorting and searching,
exiting or aborting a program, and executing DOS commands.

void abort(void);
int abs(int _i);
int atexit(void (*_func)(void));
double atof(const char *_s);
int atoi(const char *_s);
long atol(const char *_s);
void * bsearch(const void *_key, const void *_base, size_t _nelem,
size_t _size, int (*_cmp)(const void *_ck, const void *_ce));
div_t div(int _numer, int _denom);
void exit(int _status) __attribute__((noreturn));
char * getenv(const char *_name);
long labs(long _i);
ldiv_t ldiv(long _numer, long _denom);
void qsort(void *_base, size_t _nelem, size_t _size,
int (*_cmp)(const void *_e1, const void *_e2));
int rand(void);
void srand(unsigned _seed);
double strtod(const char *_s, char **_endptr);
long strtol(const char *_s, char **_endptr, int _base);
unsigned long strtoul(const char *_s, char **_endptr, int _base);
int system(const char *_s);

Some functions in the standard library have been omitted from the above list, because they
are either 'C' functions that have a better counterpart in C++ or because they refer to the
wide char type that is not covered on this course.
Help on these functions can be obtained from within RHIDE by selecting Help.libc
reference.alphabetical list or by entering info at a DOS prompt, moving the cursor to
* libc.a: (libc).
The Standard C Library Reference
and pressing Enter, then Alphabetical list.

117
Bibliography

Bibliography

C++ From the Beginning Skansholm J Addison-Wesley


C++ for Engineers Bramer B & Bramer S Arnold
Instant C++ Programming Wilks Ian Wrox
C++ Primer 3rd Edition Lippman Stanley B Addison-Wesley
The C++ Programming Language 3rd
Edition Stroustrup Bjarne Addison Wesley
Object-Oriented Programming using C++ Romanovskaya, Shapetko
& Svitovsky Wrox
Software Engineering 4th Edition Sommerville I Addison-Wesley
Software Engineering - A
Practitioner's Approach Pressman R S McGraw-Hill
Algorithms + Data Structures = Wirth N Prentice Hall
Programs
Classic Data Structures in C++ Budd Timothy A Addison Wesley

119

You might also like