Professional Documents
Culture Documents
Software Emgg Using C++
Software Emgg Using C++
Software Emgg Using C++
Software Engineering
using C++
Lecture Notes
September 1999
Table of Contents
Table of Contents
BASIC C++ .....................................................................................................................1
1. A First C++ Program.......................................................................................................................... 1
2. Data Types.......................................................................................................................................... 1
3. String Constants.................................................................................................................................. 2
4. Variables and Constants ..................................................................................................................... 3
5. Arithmetic Operators .......................................................................................................................... 4
6. Type conversions................................................................................................................................ 5
7. Assignment operator........................................................................................................................... 5
8. The compound assignment operators ................................................................................................. 5
9. The increment & decrement operators ............................................................................................... 5
10. Iostream library .................................................................................................................................. 6
11. Command line redirection .................................................................................................................. 6
12. Streams ............................................................................................................................................... 6
13. Output manipulators ........................................................................................................................... 7
14. Relational operators and expressions.................................................................................................. 8
15. FALSE and TRUE.............................................................................................................................. 8
16. Logical operators and expressions...................................................................................................... 9
17. Short-circuit evaluation ...................................................................................................................... 9
18. The while statement.......................................................................................................................... 10
19. The if statement ................................................................................................................................ 11
20. Style for logical expressions............................................................................................................. 12
21. The ctype library............................................................................................................................... 12
FUNCTIONS .................................................................................................................13
1. Introduction ...................................................................................................................................... 13
2. Input and output in functions............................................................................................................ 15
3. Multi-function programs................................................................................................................... 15
4. Stepwise Refinement (or Top-down design) .................................................................................... 16
5. Automatic variables.......................................................................................................................... 17
6. Function values................................................................................................................................. 17
7. Function arguments .......................................................................................................................... 17
8. Function argument agreement & conversion.................................................................................... 18
9. Overloaded function names .............................................................................................................. 18
10. Reference Arguments ....................................................................................................................... 19
11. Function comments .......................................................................................................................... 19
12. Summary .......................................................................................................................................... 19
i
Table of Contents
ARRAYS .......................................................................................................................35
1. Introduction.......................................................................................................................................35
2. Defining and referencing arrays........................................................................................................35
3. Array initialisation ............................................................................................................................37
4. Multi-dimensional arrays ..................................................................................................................38
5. Arrays as function arguments ...........................................................................................................38
6. Pointers and arrays............................................................................................................................39
7. Character strings and variable pointers .............................................................................................40
8. Character string input/output ............................................................................................................40
9. Arrays of pointers and pointers to pointers .......................................................................................41
10. Command line arguments .................................................................................................................42
11. Initialising pointer arrays ..................................................................................................................43
12. Review ..............................................................................................................................................43
13. Summary...........................................................................................................................................44
14. An array application - Stack of char .................................................................................................45
SORTING......................................................................................................................73
1. Introduction.......................................................................................................................................73
2. Components of Sorting .....................................................................................................................73
3. Sorting Files......................................................................................................................................73
4. Why sort?..........................................................................................................................................75
5. Does it pay to sort? ...........................................................................................................................75
6. What is the best sort? ........................................................................................................................75
7. Sorting efficiency..............................................................................................................................75
8. Simple Array Sort - Exchange (Bubble) ...........................................................................................76
9. Insertion Sort.....................................................................................................................................77
10. Simple Sort performance ..................................................................................................................78
11. Conclusions.......................................................................................................................................78
12. Complex sorts ...................................................................................................................................78
ii
Table of Contents
13. QuickSort.......................................................................................................................................... 79
14. Efficiency of Quicksort .................................................................................................................... 80
15. C++ code for function Quicksort ( see Wirth )................................................................................. 81
16. Comparison of complex sorting algorithms...................................................................................... 81
17. Further Reading ................................................................................................................................ 81
TESTING ......................................................................................................................83
1. The context for testing - Verification and Validation....................................................................... 83
2. The objectives of testing................................................................................................................... 83
3. Testing & Debugging ....................................................................................................................... 84
4. Two different testing strategies ........................................................................................................ 84
5. Categories of Testing........................................................................................................................ 86
6. Test Planning.................................................................................................................................... 86
7. How much testing? ........................................................................................................................... 87
8. Test Data v Test Cases ..................................................................................................................... 87
9. Black box v White box testing ......................................................................................................... 87
10. Black box testing .............................................................................................................................. 88
11. White box testing - Introduction....................................................................................................... 91
12. White box testing.............................................................................................................................. 92
13. Automated Testing ........................................................................................................................... 96
LIBRARIES................................................................................................................. 115
1. The ctype library............................................................................................................................. 115
2. The maths library............................................................................................................................ 116
3. The standard library........................................................................................................................ 117
BIBLIOGRAPHY......................................................................................................... 119
iii
Basic C++
Basic C++
1. A First C++ Program
// first.cpp
// My first C++ program
// A. Student
// 27/09/99
#include<iostream>
int main( void )
{
cout << “Hello World” << endl;
return 0;
}
The lines starting with // are comments. These are for human consumption - the compiler
ignores them. They cause all text on the current line to the right of the symbol to be a
comment. An alternative form of comment is the pair:-
/* this is a comment */
These do not need repeating on every line and therefore a number of lines can be enclosed
within one pair.
Since the program is going to display output, it is necessary to make available the
input/output library iostream. This is done by issuing a compiler directive that the text of
the file iostream.h should be included in the compilation. The compiler knows where to
find this file. The word cout represents the output stream and the symbol << causes what
follows it to be placed on the standard output stream. By default, the standard output
stream is displayed on the terminal.
Every C++ program must have one, and only one function main. This is where program
execution always commences. This, and all other functions have a return type, in this case
int, and an argument list, in this case empty - indicated by void and a body that is delimited
by open and close braces { }.
The first line of function main outputs the message “Hello World” to the terminal followed
by a new line. The program then terminates, returning the value 0 to the operating system.
By convention, a return value of 0 indicates success. This program deals only with two
values - a constant string1 literal containing the words “Hello World” and an integer
constant 0. It does not require the use of any variables. Most programs require the use of
variables, i.e. storage locations in memory that contain values during program execution.
Variables may be of different types.
2. Data Types
There are a number of basic data types built in to all programming languages. A data type
consists of a name and a specification of :-
! the range of values that a variable of that type can hold - its domain. This range is
often limited due to the amount of storage that is used by such items.
! the operations that may be carried out on values of that type
In C++, the most common data type is int - whole numbers that may be positive or
negative - natural numbers. The amount of storage allocated to variables of type int is
1
A sequence of characters
1
Basic C++
often 2 bytes and sometimes 4 bytes depending on the compiler. This allows a range of
values from
! -32768 - 32767 in the case of 2 bytes and
! -2,147,483,648 - 2,147,483,647 where 4 bytes are employed.
These peculiar ranges arise from use of the binary system.
The fundamental native2 data types and their storage size in GNU C++ are:-
type Range of values Bytes
Char Character codes 0 - 127 1
unsigned char Unsigned character codes 0 - 255 1
short int Signed integer -32768 to 32767 2
Int Signed integer -2,147,483,648 to 2,147,483,647 4
unsigned int Unsigned integer 0 - 4,294,967,295 4
long int Signed integer -2,147,483,648 to 2,147,483,647 4
Float 1.17549e-38 to 3.40282e+38 4
Double 2.22507e-308 to 1.79769e+308 8
Note that, unlike some compilers, GNU C++ uses 4 bytes for type int thus providing the
same range of values as type long int (or just long). Unsigned integers have double the
capacity of signed integers because there is no need to store the sign.
Strings and characters are not the same. A string containing only a single character, e.g.
"W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A
character variable can hold only one single character, e.g. 'W', normally occupying only
one byte.
To declare a variable of type string and give it a value immediately:-
char myname[] = "Terry Chapman";
If the string is not intended to be changed, it should be declared as a constant:-
const char myname[] = "Terry Chapman";
The empty brackets signify an array whose size is determined automatically by the
compiler which also reserves space for the terminating ASCII NUL. The variable or
constant can be output in the usual way, i.e.
cout << myname;
3. String Constants
A string constant is a sequence of characters enclosed in double quotes. e.g. "MSc
Information Technology". The sequence may be empty e.g. "".
If the string is to include certain characters, e.g. double quotes and the backslash, then
these must be escaped with the '\' backslash character, e.g.
"She said \"I have lost my file mydir\\myprog.cpp\"". When output, this would display:
She said "I have lost my file mydir\myprog.cpp"
2
Types built into the language
2
Basic C++
Other special characters may be included, e.g.
\n newline \? question mark
\t Tab \' single quote
\f formfeed \a alarm bell
A string constant can extend over 2 or more lines by placing a backslash at the end of an
uncompleted line.
Two adjacent strings are concatenated to form a single string e.g.
"This string " "is concatenated with this one"
There is no native data type string in C++. Instead, strings are implemented as
an array3 of characters terminated by the special character '\0' (ASCII NUL). 0 1 2 3 4 5
ie the unprintable character which has the ASCII code 0. We will cover arrays H e l l o \0
later - they are a very important compound data type holding a sequence of
data items in a contiguous area of memory.
Strings and characters are not the same. A string containing only a single character, e.g.
"W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A
character variable can hold only one single character, e.g. 'W', normally occupying only
one byte.
To declare a variable of type string and give it a value immediately:-
char myname[] = "Terry Chapman";
If the string is not intended to be changed, it should be declared as a constant:-
const char myname[] = "Terry Chapman";
The empty brackets signify an array whose size is determined automatically by the
compiler which also reserves space for the terminating ASCII NUL. The variable or
constant can be output in the usual way, i.e.
cout << myname;
3
A contiguous sequence of memory locations
3
Basic C++
Identifiers must start with a letter. After this, they may contain any number of letters,
digits or the underscore character. They must not include spaces.
int this_is_a_very_long_identifier_with_99 = 99; // valid
float The Average; // invalid - contains a space
char 2good; // invalid, starts with digit
You must use meaningful identifiers. They are part of the program’s documentation and
should be expressive of the purpose for which the identifier is required. An exception to
this is loop control variables that have no other purpose than to access elements of an
array. These are commonly a single character e.g. i, j.
Constants are named items that cannot change. These are used for values in the program
that will remain constant throughout the program’s execution. They must be initialised
with a value.
Examples:-
const double pi = 3.14159265359;
const int numitems = 350;
5. Arithmetic Operators
+ unary plus or addition
- unary minus or subtraction
* multiplication
/ division
% modulus
Note that there is no exponentiation operator that raises a number to a power. There are
library routines that accomplish this.
The above operators apply to all numeric types (except %). Modulus produces the
remainder after integer division and applies only to integral types-
5 % 2 = 1, 11 % 3 = 2, 19 % 5 = 4.
You should find a table of operator precedence in your textbook. 2 + 3 * 4 means "add 2
to the product of 3 and 4". If you want it to mean "add 2 to 3 and then multiply by 4" you
must change the precedence with parentheses (2 + 3) * 4.
A combination of arithmetic operators and arithmetic constants or variables is known as an
arithmetic expression. An expression has a value, thus 10 * 3 has the value 30.
A statement on the other hand is a command to carry out processing, e.g.
x = 10 * 3; is a statement that means assign to the variable x the value of the expression
10 * 3.
You might have rationalised the difference between a statement and an expression by
thinking to yourself that an expression has a value whereas a statement does not. You
would be correct if you were talking about most conventional programming languages like
Pascal, Modula-2 and BASIC. But you would be wrong if you were talking about C and
C++ since, in these languages, a statement also has a value - in the above example, the
statement x = 10 * 3 has the value (30). This value can be used for further operations, e.g.
for assignment to another variable:-
y = x = 10 * 3; // both x and y now have the value 30
4
Basic C++
6. Type conversions
There are two aspects:-
! Automatic conversions carried out by the compiler
These are discussed below (para 7)
! Type conversion operators
These use the name of a type as a function in order to force an expression into a
particular type e.g. int(99.21) will yield 99.
7. Assignment operator
C++ carries out automatic type conversion so that the result of an expression on the right
hand side of the assignment symbol is automatically converted (if possible) into the type
of the variable on the left hand side. This is convenient in many ways, but there are
occasions when you need to know what the exact effect is. Like letting a futuristic washing
machine automatically decide what program to use according to the clothes you put in.
What program does the machine decide to use when you wash a silk shirt and a very dirty
towel? Do you get a grubby towel or a ruined silk shirt? Ultimately you will need to know
what the conversion rules are, but do not worry about them at present. In any case it is
desirable not to make a habit of mixing your washing since you may get a result you did
not expect.
Briefly, fractional values (types float and double) are truncated when assigned to integral
variables (int, unsigned int, long int). Large values that exceed the capacity of the integral
variable to which they are assigned will cause overflow and the result will be meaningless.
No overflow warning is issued and care should be taken when writing expressions with
integral value to ensure that overflow does not occur.
Note that the last may only be used with integers, all others may be used with any
arithmetic type. Note also the effect of sum /= 3 + 7. The expression 3 + 7 is evaluated
first.
similarly with --
We will return to these when we look at processing arrays using loops.
5
Basic C++
12. Streams
Access to istream (input stream) and ostream (output stream) operators is obtained by
putting the preprocessor directive #include<iostream> at the top of each program file that
needs to carry out standard input and/or standard output. This has the effect of including
the header file iostream.h (a text file) in the compilation.
<< and >> are known as the insertion and extraction operators. The
unusual notation arises from the object-oriented aspects of the
language. Just take it for granted at present
endl causes subsequent output to be displayed on the next line of the
display.
cin.get(ch) gets a single character from standard input and returns the state
of the standard input stream
cout.put(ch) puts a single character to standard output and returns the state of
the standard output stream
cout.good() Return true if there has been no error from the last output (input)
cin.good() operation
cout.bad()
The opposite of good()
cin.bad()
cin.eof() Returns true if end of input encountered, false otherwise. When
entering from the keyboard, end of input is indicated with Ctrl Z.
All of the fundamental types supported by C++ (including strings) may be input
using cin >> and output using cout <<.
6
Basic C++
The items starting with ios:: within the parentheses after setiosflags are constants that are
defined in the iostream library. Their names are self-explanatory and you do not need to
know their values. The meaning of ios:: will only be explained in a subsequent module
unless you read up on it yourself.
A program basiccpp.cpp is provided in the lab that shows the effect of setw(n) and some
of the flags that can be set using setiosflags(), including display of integer in octal and
hexadecimal.
7
Basic C++
Take care not to use = as the equality operator. This is a common programming
error.
Beware of testing two floating point variables for equality. Their binary internal
representation means that many fractional values cannot be expressed exactly.
Instead, test for the difference between their absolute values. The function
fabs(<float>) can be used to find the absolute value of a float or double. To use it
you need to #include<math.h>
double f1 = 12.34574, f2 = 12.34578;
const double delta = 0.00005;
if ( fabs( f1 - f2 ) < delta )
… // consider them equal
else
… // consider them unequal
See also Skansholm p52.
8
Basic C++
!5 false
!0 true
ch = 'a'; assign to ch the letter 'a', i.e.NOT the test for equality
ch == '\0' false
!(ch == '\0') true
true (the character 'a' is converted to an integer and is
(ch)
tested for non-zero)
(!ch) false
(ch && ch != '\n') true
(ch == 'a' || ch == 'A') true
9
Basic C++
<statement>
Example
// show.cpp
// copies its input to its output
#include<iostream.h> next
statement
int main(void)
{
char ch;
cin.get(ch); // get a character from the keyboard
while ( cin.good() ) // Becomes false if end of file or other input problem
{
cout.put(ch); // output the character to the display
cin.get(ch); // get the next char in preparation for the next loop iteration
}
return(0);
}
The while statement is preceded by a statement cin.get(ch) that sets up the value to be
tested by while. This is important because the termination condition may already exist in
which case the loop should not be entered. If the loop is entered, then cin.get(ch) is
repeated at the bottom of the loop to set up the condition again. This is invariably the way
that files are processed since they may be empty. It is a common error to forget to initialise
the test condition before entering the while loop.
This program can be used to display the contents of a text file if issued at the DOS
command line using redirection:-
show < show.cpp displays the source program file show.cpp at the terminal
The output can also be redirected, giving a file copy
show < show.cpp > showcpy.txt
10
Basic C++
// show2.cpp
// copies its input to its output
#include<iostream>
int main(void)
{
char ch;
while ( cin.get(ch) ) // Becomes false if end of file or other input problem
cout.put(ch); // output the character to the display
return(0);
}
The get( ch ) function is called within the loop condition parentheses. The expression
cin.get(ch) does two things: a) it gets a character from standard input and passes it back
via its argument ch and b) it returns a reference to the standard input stream cin as its
function result. The stream has the value 0 when there is no further input and this is the
condition being tested by while. This does away with the need for the get prior to entry of
the loop, and also with the get at the bottom of the loop.
11
Basic C++
12
Functions
Functions
1. Introduction
You have already seen and used a function - the function main which every C++ program
must have. Until now it has been reasonable to write all of the code of your programs in
this function. However, as programs become larger, it is necessary to break them down
into collections of smaller and more manageable units. One such subdivision is the
function. Functions give us the ability to store a computation in a named block of code and
to carry out the computation simply by referring to its name i.e. by calling the function.
This facility for breaking programs down into simpler and more manageable units is a
major weapon in the fight to reduce the complexity of large programs and involves the
process of abstraction. Abstraction allows us to concentrate on the current task and to
ignore details that are not relevant. So when we call a function e.g. sqrt to find the square
root of a number, we are concerned only with how to make the call and not what steps the
function takes to achieve the computation. We do need to know the data type of the
number to be passed to sqrt, the data type of the value returned by it and what happens if
we pass a negative value etc. - these aspects are relevant to our making the call, but the
actual details of the computation are not relevant.
Of course, at different times we will have different levels and views of abstraction - if we
had been concerned with writing function sqrt then we would have been concentrating our
attention on expressing the algorithm to compute the square root of a number and would
have ignored unnecessary detail elsewhere (e.g. the other functions which make up the
library maths). A further advantage of storing code in functions is, of course, the ability to
re-use them again in other programs.
This type of abstraction is called procedural abstraction after the procedures - the name
that most other languages use to refer to these named blocks of code. Technically a
function differs from a procedure in that it returns a value, whereas a procedure does not.
C++ does not have procedures, but it is possible to specify that a function does not return a
value. Functions in C, C++ and most other languages (except the functional languages) do
not conform closely to the mathematical concept of a function that accepts a single
argument and returns a single value. As we shall see, it is possible to pass more than one
value to a function and to get back more than one result.
The structure of a function is:-
type-specifier function_name(argument_list)
{
definition_and_statement_list
}
type_specifier The data type of the value that is returned by the function
13
Functions
Example:-
You are writing a program which needs to compute values raised to a power. There is no
exponentiation operator in C++, so you must develop one yourself. You want to be able to
write e.g.:-
result = power(12,3)
where result is a integral variable which is to be given the value 12 raised to the power 3
(i.e. 1728). On other occasions in the same program different numbers are to be raised to
different powers, e.g. in the statement
cout << power(7,5) << endl;
outputs 7 raised to the power 5, i.e.16,807.
So the function must be generalised to handle a range of different inputs for its single
result. This generalisation is provided by the argument list. In the call to the function, the
values passed to the function are known as the actual arguments i.e. 12 and 3 in the first
example above, and 7 and 5 in the second example. In the definition of the function they
are known as the formal arguments. It is important that you understand this distinction
because these two terms are used frequently when talking about functions.
Assuming that we want to be able to handle some large resulting values, the integral return
type should probably be of type long int. The type of the arguments can be left as plain
integer. The formal specification of function power is then:-
long int power(int a, int b) // long power (without the int) is also OK
{
definition_and_statement_list
return (<long_integer_expression>)
}
Where <long_integer_expression> is an expression of the result of raising a to the power
b. When the call power(12,3) is made the actual argument values 12 and 3 are copied into
their respective formal argument variables a and b. If the actual arguments had been
integer variables (as opposed to constants) with the same values (12 and 3), then the
values of the actual argument variables would have been copied into the formal argument
variables producing exactly the same effect.
In the function, the formal arguments a and b are effectively local variables of the
function. Any variable definitions made in the body of the function are also local
variables. This means that they are not accessible from outside the function. In fact,
normally, they only exist while the function is executing and are then removed from
memory. Inside the body of power there will be an appropriate computation that produces
a value representing a raised to the power b, and this value will be passed back by the
return statement. A function normally has a value (unless its return type is void) and can
therefore be used on the right hand side of an assignment or within a cout statement in just
the same way as a variable or an arithmetic expression. In fact a call to a function which
returns a value is an expression.
In the case of the statement:- result = power(12,3), the returned value will therefore be
assigned to result. The value returned by power can be used anywhere else that an
expression of long int type is required, e.g. in
cout << power(7,5) // 16,807
or even as the actual argument of a call to another function.
cout << power( power( 2, 3 ), 4 ) // 4,096
14
Functions
3. Multi-function programs
There must always be a function called main in any C++ program. There may be any
number of other functions in the same source program file (or indeed in other source
program files). The question then arises - where do you put these other functions? C++
does not allow functions to be nested within other functions (unlike Pascal and Modula-2).
So additional functions may appear textually either before function main, or after it. When
the compiler scans the source text of a program, it will flag an error if it finds a call to a
function whose definition it has not yet encountered. So if a function is defined after main,
then a function declaration must appear before the point at which the call is made. This
declaration (also known as a function prototype) should normally be placed at the start of
main giving the compiler sufficient information to enable it to check that the function has
been called properly. This prototype will consist only of the return type, the function
name, and the types of its arguments.
// fun01.cpp
// illustrates the placing of functions in relation to main
// tdc 28/09/95
#include<iostream>
int add(int a, int b ) // this placing is deprecated
{
return(a + b);
}
int main(void)
{
// int mult(int a, int b); // prototype commented out
int x = 10, y = 3;
cout << add( x, y ) << endl << mult(x,y) << endl
Error: Function 'mult' should have a prototype in function main()
return(0);
}
int mult(int a, int b)
{
return (a * b);
}
Function add has been placed before main contrary to the recommendation for best
practice above.
15
Functions
The prototype for function mult has been commented out, causing the compiler error.
Removing the comments allows the program to compile successfully.
Different organisations may set their own 'house' styles, but we will show the full
definition of functions after main with prototypes normally appearing as the first
definitions within the body of main.
Note that the identifiers a and b in the prototype for mult are not essential. The prototype
could have been
int mult(int, int); // prototype with argument identifiers omitted
But the argument identifiers may be included if they aid the understanding of their
purpose. The compiler will also flag an error if the prototype does not match the formal
definition as regards either its name, or its number and type of arguments. But it will not
detect a difference between the return type as declared in the prototype and as defined in
the formal definition. If there is such a difference then a run-time error is likely to result.
16
Functions
code. They may well contain constructs such as branching (if, else) and loops (while etc.)
within which the subsidiary functions are called.
5. Automatic variables
Variables declared within a function are called local variables and have the default storage
class automatic (auto is the key word). Since this is the default, the storage class does not
have to be given and it is normal to omit it. There are other storage classes that will be
dealt with later.
Scope is an important topic since the scope rules determine the visibility of objects. If an
object is not visible, it cannot be changed. Your Unix password is invisible to others
because, if others had access to your account, you do not know what they could do. They
might let you have useful comments about your work. On the other hand they might
change it, or delete it. The scope mechanism is employed to reduce the chances of errors in
a program caused by some other programmer (or even yourself!) from inadvertently
corrupting the program as a result of changing an object to which he/she should not have
access. This is part of the concept of encapsulation which we shall cover in more detail in
the second Semester. For now, work on the principle that functions should not, as a rule,
use or modify global variables.
As an example, if function x requires a variable to control a loop, declare that variable
locally within the function. In that way, only errors within the function itself can cause the
loop to run incorrectly. If a global variable were used for this purpose, there is a possibility
of it being changed from outside the function while the loop is executing causing errors
that can be very difficult to identify and correct. Similarly, although there can be
exceptions, functions should not modify global variables directly. Instead this should be
done via arguments. More about how in a later lecture.
An obvious corollary to the lack of visibility of a local variable from outside the function
is that variable names may be duplicated within different functions without any clash.
6. Function values
A fairly obvious point - the value appearing after return should be of the same type as that
in the definition. Thus
int add(int a, int b)
should, in its return statement, return an integer value. You have been doing this for some
time in function main.
As mentioned earlier, it is possible for a function to accept no arguments, or to return no
value. In either of these cases, the reserved word void should be used, e.g.
void dosomething(void)
is a function which neither accepts arguments nor returns a value. In this case it must not
have a return statement, and a call to it must be used differently to reflect the fact that no
value is returned.
dosomething(); // i.e. a statement, not an expression
result = dosomething(); // wrong
7. Function arguments
These are a means of passing information to a called function. It is also possible for a
function to pass information back via its arguments and this will be dealt with later.
Arguments are a comma-separated list of type/identifier pairs appearing within the
parentheses after the function name, e.g. (int a, int b) as in function add above. Naturally,
the number and type of the actual arguments supplied in the call must match the number
17
Functions
and type of the formal arguments with the exception of default arguments (see Default
arguments on page 32. The function may modify the values of its arguments, and this will
have no effect on the values of any actual argument variables used in the call.
Remember that the values of the actual arguments are copied into the formal argument
identifiers. This is the pass-by-value argument mechanism. The actual arguments may be
any expression of the correct type. This includes a literal constant, e.g. 9.0, a variable, e.g.
f, or even a call to another function which returns a value of the correct type, e.g.
cout << sqrt( sqrt(81.0) ); // outputs 3.
18
Functions
These pre and post conditions then form a contract between the caller and the function.
The caller guarantees to meet the pre-conditions and the function guarantees to satisfy the
post-conditions. If the caller fails to meet his side of the contract (i.e. he does not meet the
pre-conditions), then all bets are off, and the function is relieved from meeting the post-
conditions.
Some language designers consider that this concept is so important that it should not be
dealt with merely by comments. They have therefore incorporated pre and post conditions
into the language so that they can be checked at run-time, raising an exception if the
contract is broken. Eiffel is an example.
Large programs have to be broken down into smaller and more manageable components in
order to deal with their complexity and to allow teams of programmers to work on them.
The separate components can be tested individually with a range of inputs to ensure that
they behave as specified. But what happens when they are put back together again? Will
all these components work together? Or will there be discrepancies arising from a
misunderstanding on how the parts interrelate? The ability to check the interaction of
these components at run time can provide significant advantages in terms of quality and
reduction of debugging time.
12. Summary
We have looked at functions which may have formal arguments or should have the word
void in the formal argument list to indicate that no arguments are required. Functions
normally return a value via the return statement, and the type returned must agree with the
return type provided in the definition.
Functions are called by name, passing actual arguments whose values are copied into the
formal arguments. Since a call to a function that returns a value is an expression (i.e. it has
a value), a function call may be used in any case where an expression is expected.
It is recommended that function definitions appear after the function main. This requires
that function prototypes appear as the first lines of function main. Functions whose
prototypes are supplied in main are private to main, i.e. the prototypes serve the
requirements of main and no other functions. If there are other functions, defined after
main, and before the functions they wish to call, then they will not be able to do so. There
are two solutions:-
19
Functions
! Ensure that the definitions of the functions to be called appear before the definitions
of the functions that wish to call them.
! Provide prototype declarations for the called functions before main so that they have
file scope and can therefore be called from anywhere in that file.
Local variables of a function usually have the storage class auto and are not visible to code
outside the function. They cease to exist after the function terminates. The formal
arguments are also invisible from outside. Changes to formal arguments that are passed by
value and changes to local auto variables have no effect outside the function, and their
identifiers may duplicate identifiers appearing elsewhere in the program.
Functions are one of the weapons that C++ provides in the war against complexity and the
errors that this complexity may bring with it. They are an example of procedural
abstraction and allow a program to be designed as a hierarchy of functions that
progressively refine the problem by breaking it down into smaller problems. Large
programs must be designed on paper using this process of stepwise refinement before the
program is written. A suitable tool for this design process is a PDL (program description
language), one variant of this being known as Structured English. Libraries of frequently
used routines (functions) can be written and a very large number of libraries are provided
with all compilers, each library containing a number of functions.
Pre and post conditions provided as comments at the head of the function are an important
way of specifying what they do and how they are to be used. This helps to ensure that,
when a large number of tested functions is finally brought together to form a program, the
various parts work together as specified.
Ideally, input and output should be isolated in a limited number of functions designed for
that purpose and not scattered about over many functions whose primary purpose is not
I/O. Generally speaking, functions should not modify global variables and should never
use global variables for such local uses as loop control.
20
Flow of Control
Flow of Control
1. The type cast operator
The typecast operator provides the possibility of forcing an expression into another data
type by using the name of the new type as though it were a function. For example, in a
program to calculate the statistics on a sequence of integers, the mean can be calculated
from the integer total of the numbers divided by the count of the number of items (where
mean is a float) by:-
The new C++ standard has introduced four new operators that carry out explicit
conversions from one type to another. Of these four, only static_cast is introduced. It is
intended to be used for conversion between similar types, e.g. between char and int,
between int and enum, and between float and int. Example:-
mean = static_cast<float>(sum) / static_cast<float>(count);
Explicit type conversions are error-prone and a large proportion of program errors is due
to them (Stroustrup). The virtue of the new operators is that they are easy to search and
find in large program source files, whereas the earlier example float(sum) could be very
difficult to find.
21
Flow of Control
Equivalent to:-
Example 2 if ( justify == 'L' )
cout.setf( justify == ‘L’ ? ios::left : ios::right ); cout.setf( ios::left );
else
cout.setf( ios::right );
4. The for statement
In most programming languages, the for iteration construct is suitable mainly for loops
whose number of iterations can be determined in advance. In C++, the for loop is much
more general and can, in fact, be employed for any loop including while and do. The
syntax is
22
Flow of Control
Note that, in example b) above, bool forever in the first expression is the declaration
of a new boolean variable. It is convenient and makes programs easier to read if the
declaration of variables is as close as possible to the point where they are used. This
facility is one of the improvements over the ‘C’ language provided by C++.
Because of its versatility, there is a tendency for programmers to use the for loop
exclusively and to ignore the while loop. However, the latter is designed to deal explicitly
with cases where the loop should not be entered at all under certain conditions (e.g. when
processing a file which may be empty). Although this condition can be handled by for as
shown above, its primary purpose is for loops whose number of iterations can be
determined before it is entered e.g. when processing arrays (to be covered soon). The very
fact that a while loop is being used signals that it may never be entered whereas, in a for
loop this fact can only be determined by inspection of its expressions.
5. The do statement
In a limited number of cases, processing requires that the loop condition is tested at the
bottom rather than at the top of loop. In other words, the statement(s) in the loop body will
always be executed at least once. The format of the do loop is:-
do
statement_block;
while ( expression );
where expression is a logical expression yielding either true or false. As with all loop
statements, if statement_block comprises more than one statement, it must be enclosed in
braces:-
do
{
statement_1;
statement_2 ;
...
} while ( expression ); // Note: the test normally appears on same line as the
// closing brace
6. Nested loops
Frequently, a loop is nested within another loop or loops. The reasons why this might be
necessary will become clearer when arrays are covered. Notice that the total number of
iterations of the inner loop is the product of the number of its iterations and those of any
surrounding loops. This number can escalate to very large values and can result in
programs that run slowly.
for ( int i = 0; i < 10; i++ )
for ( int j = 0; j < 10; j++ )
for ( int k = 0; k < 10; k++ )
process( i, j, k )
Function process is called 1,000 times.
;
Sometimes, it may not be obvious how many potential iterations of the inner statement
will occur because, for instance, the second and third lines above may consist of function
calls that, themselves, contain a loop. You should always be aware of the possibility of
introducing inefficiencies into a program in this way because it may result in unacceptable
performance.
23
Flow of Control
Where:-
constant_expression1,2,3 .. are some possible values of expression, e.g. 'P', 'D', 'E'.
They must be constants, either literal or symbolic - see
the example below.
statement_block is a statement or sequence of statements which will
normally end with the break statement. The effect of
break is to cause control to jump out of the switch
statement and not to execute any statements in the
following cases. If break is not present, then the
statements in subsequent cases are executed until either a
break statement is encountered, or the end of the switch
statement is met.
default if none of the cases is met, the statements in the default
section are executed. It is wise always to include this so
as to deal with all other possible values of expression.
24
Flow of Control
25
Pointers, References and Functions
2. Reference Type
We introduce a new data type the reference whose value is not an integer, float, char etc.
but a reference to a variable which holds an integer, float, char etc. It is an alias for
another object. Alias means another name for.
Example:-
k b
Assume the following declarations
int k = 5;
int& b = k; Address Contents Address Contents
27
Pointers, References and Functions
3. Pointers v References
Pointers are carried over from C, and are, in part, superseded by the reference type.
However, many C libraries use pointers and the type has been retained for compatibility
purposes and for their importance in building dynamic data structures. Some books
describe references in abstract terms, and pointers in concrete terms. Pointers, they say, are
variables which hold the address of another variable. But, in fact, this is exactly what
references hold as their value. The differences are:-
! Abstraction
The fact that references hold an address does not need to be known in order to use
them, whereas you must take specific action in order to make a pointer point to
some other object and to obtain the value of the object pointed to (see Syntax
below).
! Syntax
Pointers require special symbols to be used by the programmer -
! to assign to a pointer the address of another object i.e. to make it point to it - use
the address operator &
! to yield the value of the object to which a pointer points, known as
dereferencing - use the indirection operator *
Reference variables, once declared are treated as ordinary variables without the use
of special symbols. The necessary indirection is looked after by the compiler.
References are at a higher level of abstraction than pointers. A further difference is that
pointers can be reassigned at will to point to another variable and can be incremented to
step through memory. They are a much lower level tool than references as befits their
origin. References cannot be reassigned to point to a different object.
28
Pointers, References and Functions
4. Enumeration Types
It is valuable as a documentation tool to use symbolic names for constant values in
programs. The classic case is pi which can be given a symbolic name by
const double pi = 3.14159265359;
If however you need to model a real world object that may take on any one of a set of
know values, then you can declare an enumeration type -
enum dow = { SUN, MON, TUE, WED, THU, FRI, SAT };
dow day1, day2, day3;
Creates an new data type dow (day of week) and declares 3 variables ( day1, day2 and
day3 ) of this type. Note that the enumerated values are not strings. They are simply
constant numeric values that commence with 0.
A further possible use for an enumeration type is to describe the different states that a
program may be in at any one time. In this type of program, the processing e.g. of input
will vary depending on the current state, and certain types of input will have the effect of
changing the state. An example of this type of processing is reading a data file that
consists of several lines, each containing a description and a number. The description may
include numeric digits, so it is enclosed in quotes:-
“3D Drawing Program” 12
“Sprocket Type 4S” 31
The states might be described using an enumeration as follows
enum State = {IN_NAME, IN_NUMBER, BETWEEN};
State state = BETWEEN;
int a = 6, b = 199;
swap( a, b );
cout << setw(6) << a << setw(6) << b << endl;
199 6
29
Pointers, References and Functions
This function does not need to have a return value, but it must return the changed values of
its two arguments. This is accomplished by using reference arguments-
void swap( int& x, int& y )
{
int temp;
temp = x;
x = y;
y = temp; // classic swap algorithm. Needs a temporary variable
}
So what is happening here?
The actual arguments in the call swap( a, b ) are the variables a and b. The formal
arguments are defined to be references to integer ( int& x, int& y ). When the function is
called, the compiler recognises that the function is expecting references to integers and not
integer values, so it copies into x and y, not the values 6 and 199, but references to the
variables a and b which hold these values. When the swapping is carried out in the body of
the function, the values that are swapped are those of the variables referenced by x and y,
namely a and b. This is because x and y are aliases for a and b, so anything done to x and y
is actually being done to a and b! For this reason, the function may only be called with
variables and not with literal constants, e.g. swap( 6, 199 ); would be an error.
This is the mechanism provided by C++ to allow a function to return values via its
arguments. Not all of the arguments need to be reference arguments. A function to convert
a time in seconds held as a long int (first argument) into hours, minutes and seconds (the
remaining 3 arguments) will have the first argument as a value parameter and the
remaining three as reference parameters.
30
Pointers, References and Functions
Example 1
A typical example of C code:-
...
char name[] = "i am all in lower case";
makeupper(name);
cout << name << endl;
I AM ALL IN LOWER CASE
Note that, in 'C' and C++ an array passed as an argument to a function is always passed as
a pointer to the first element.
Example 2
The 'C' string library cstring or string.h contains a number of functions operating on 'C'
style strings which accept pointer arguments and some of which return pointer results,
typical ones are:-
char *strcat(char *dest, const char *src); // concatenates 2 strings returning a
// pointer to the result. dest has been
// modified
char *strcpy(char *dest, const char *src); // copies src into dest returning a
// pointer to dest as result
An example of the use of these two functions is:-
char source[25] = "GNU";
char *blank = " ", *cplus = "C++";
char destination[25];
char *p = destination; // p points to the string destination
p = strcat(source, blank); // concatenate a blank onto source. p points to source
strcat(source, cplus); // concatenate "C++" onto source
strcpy(destination, p); // copy the result back into destination. p still points to
// source which has been changed.
cout << "destination = " << destination << endl;
destination = GNU C++
31
Pointers, References and Functions
8. Default arguments
Sometimes we need to provide an argument that enables the caller to change the default
behaviour of the function. Where the default behaviour is not to be overridden, then there
should be no need to provide this argument. C++ permits a default argument value to be
specified in the function declaration and, if this argument is not supplied by the caller, then
the default value is used by the function. If the argument is supplied, then it overrides the
default. In the case of one default, it must be the last. In the case of two defaults, they must
be the last and last but one etc.
The default must be supplied only once - in the declaration (prototype), and should not be
repeated in the function definition.
Assume a function is to print to the stdout a number of lines of a file. The default is 4
lines, but this may be overridden by supplying an argument specifying a different number
of lines.
void printfile( char filename[], int numlines = 4 ); // prototype
void printfile ( char filename[], int numlines ) // definition
{
...
...
}
printfile( "fred.cpp", 10); // overrides default with 10
printfile( "jim.cpp"); // default of 4 is used
9. Inline functions
Calling a function has an overhead that costs time. The runtime system has to set up a
'stack frame' and allocate space for the arguments and local variables. On termination, the
stack frame has to be released and a jump made to the point immediately after the call.
Very small functions can be specified as 'inline' so that the compiler will substitute the
actual code of the function body for each occurrence of a call to the function. This will
improve speed at the expense of code size. In fact, the use of inline is a recommendation
only, and there is no guarantee that the compiler will honour it - this will depend on the
compiler and the size of the inline function.
int main ( void )
{
inline int square( int ); // prototype
...
z = square( x ); // compiler should substitute z = x * x
...
}
int square( int a )
{
return ( a * a );
}
A test of the above program was timed for 100 million calls to function square. The
elapsed time without inlining was approx 3.9 seconds and, with inlining, approx 3.05
seconds - an improvement of 20%. The code size was increased by a very minor amount
because the call to function square occurs only once.
32
Pointers, References and Functions
Note that the GNU compiler does not care whether the keyword inline occurs in the
prototype, in the function definition or in both places. To achieve inlining the compiler
optimisation switch -O has to be set. In RHIDE change the option
Options.Compilers.Optimizations -O to 1
33
Arrays
Arrays
1. Introduction
Arrays are an aggregate type capable of holding a number of values all of the same type,
contiguously in memory. The components may be any one of the fundamental data types -
int, long, unsigned, float, char, enumerated, pointer or one of the aggregate types, i.e.
array, struct or class. The struct and class types have not yet been covered. The struct is
referred to in other languages as record and consists of one or more fields of (possibly)
different types (including arrays and records). The class data type will be covered in the
Object-Oriented Programming & Design module.
The advantage of the built-in array type is that a large number of data items can be held in
a single named array variable whose components can be accessed randomly as we shall
see later. The disadvantage is that its size is fixed at compile time and this cannot be varied
at run time to accommodate the fluctuating requirements of the application. Most of the
time, therefore, it is wasting space because it is not full and the type itself does not allow
resizing. The solution, as we shall see later, is dynamic memory allocation.
Example
0 1 2 3 4 5
9 14 7 5 1 3
35
Arrays
The value held by table element 0 is 9, the value held by table element 1 is 14 etc. Access
to the elements (or components) is by subscripting the table name with the desired element
number. Thus table[0] is an integer with the value 9, table[1] contains 14 etc. Notice that,
since the numbering starts at 0, the last element always has an index one less than the
number of elements. The subscripted array can be used anywhere that an expression of the
component type is required:-
const int size = 6;
int table[ size ]; Change the value of element 1 to
that of element 5
table[ 5 ] = 22;
table[ 1 ] = table[ 5 ];
cout << table[1];
output the integer (22) contained in
element 1
The subscript may be any expression with an integer value, thus:-
int i = 3;
table[ i ] = table[ size - 1 ]; change the value of element 3 to
that of element 5 (the last)
Since the array subscript can be a variable, we can process an array's elements by means of
a loop using as subscript a variable that increments for each iteration of the loop:-
36
Arrays
3. Array initialisation
Arrays may be initialised on declaration by enclosing a list of values within braces,
separated by commas. If all elements of the array are given values in this way, the number
of elements need not be supplied between the brackets after the array name:-
int table[] = { 9, 14, 7, 5, 1, 3 };
Multi-dimensional arrays may be initialised by placing braces around each row, and
separating the rows with commas (see the definition of type Plane in section 4):-
Plane aPlane = {
{ 'X', ' ', 'X', 'X' }, // Row 1
{ ' ', 'X', ' ', 'X' }, // Row 2
.... // etc.
{ 'X', 'X', ' ', 'X' } // Row 12, no comma
};
Where some initialisers are omitted, and the array is not auto, the remaining elements are
set to 0. The behaviour for auto (local function) variables is undefined.
The number of elements in an array can be found by the built-in sizeof function:-
cout << "sizeof(table) = " << sizeof(table) << endl
<< "sizeof(int) = " << sizeof(int) << endl
<< "num elements = " << sizeof(table) / sizeof(table[0]) << endl;
sizeof(table) = 24
sizeof(int) = 4
num elements = 6
But note that sizeof cannot be used in a function to find the size of an array formal
argument since this is a pointer.
37
Arrays
4. Multi-dimensional arrays
There is no theoretical limit to the number of dimensions an array may have, although the
number of elements increases rapidly with the number of dimensions as do the chances of
there being redundant elements. Two dimensional arrays are declared with 2 values, each
enclosed in brackets:-
// airplane reservation system
const int maxRows = 12,
seatsPerRow = 4;
typedef char Plane[maxRows][seatsPerRow]; // declares a new type based on a
// fundamental type
Plane aPlane; // aPlane is a variable of type Plane
void makeEmpty( Plane aPlane)
{
for( int row = 0; row < maxRows; row++ )
for( int seat = 0; seat < seatsPerRow; seat++ )
aPlane[ row ][ seat ] = ' '; // Space = empty
}
void showSeatingPlan( const Plane aPlane ) aPlane is a constant and may not
appear on the LHS of an
assignment within the function.
38
Arrays
Unlike most other languages, C++ supports pointer arithmetic and, since table is a pointer,
a variable can be used to indicate an offset from the beginning
for ( int i = size - 1; i > 0; i-- )
*( table + i ) = *( table + i - 1 );// shuffle contents one element to the right
or, using a supplementary pointer The compiler knows the size of an int, so p--
results in p being adjusted by sizeof(int), i.e.
for ( int* p = table + size - 1; p > table; p-- ) by 2 or 4 bytes on a PC (depending on the
compiler), similarly with p - 1
*p = *( p - 1 );
address of table + size(6) - 1 While the address held by p > the address of
elements = address of last table
element
39
Arrays
greeting = "william"; do this instead, but note that, if the new string is longer, the
extra chars are stored outside the array's allocated memory
cout << "word[] = " << word << endl; and may cause the program to crash
40
Arrays
const int linelen = 80;
char line[linelen+1];
cin.getline( line, linelen ); // excess chars over 80 discarded
while( !cin.eof() )
{
cout << line << endl; // output the line
cin.getline( line, linelen);
}
This makes for efficient use of memory when storing large numbers of strings.
The 4 arrays of char are allocated contiguously in memory and the above could be viewed
as follows:-
o n e \0 t w o \0 t h r e e \0 f o u r \0
ptr[0] 36714
ptr[1] 36718
ptr[2] 36722
ptr[3] 36728
4
The GNU C++ debugger built into RHIDE does not support inspect
41
Arrays
42
Arrays
Note the square brackets to indicate an optional argument. The program can then be
terminated with either:-
! return 1; when the error is detected in main, or
! exit(1); in other cases. exit is in cstdlib (or stdlib.h).
By convention, a non-zero value returned from main or as an argument to exit indicates an
error. In both cases, other non-zero values can be used to indicate different error
conditions.
12. Review
You will, by now, have seen that arrays and pointers to arrays in C++ are somewhat
complex and error-prone. This is because these facilities were designed over 20 years ago
for 'C' (a language that was originally designed for writing operating systems) and have
had to be retained in C++ for backward compatibility. In fact, the object-oriented facilities
provided by C++ allow these deficiencies to be hidden from the application programmer
who can use libraries of classes e.g. class string which hides the underlying shortcomings
of the built-in array of char type. In particular, the disadvantage of the fixed size of built-in
arrays and the absence of array bounds checking can be overcome in container classes
which are provided with most C++ implementations and are now standardised as the
Standard Template Library. However, we shall be concerned with how container classes
are designed and written and we therefore need to understand the base facilities on which
they are built.
You will be provided with a simple String data type that can be used for assignments. You
should read Skansholm pp 91-93 on the standard string type that is now part of the
Standard Template Library. If you wish, you can use this standard type wherever strings
are required.
43
Arrays
13. Summary
! The array type allows a collection of items of the same type to be stored under a
single name. The array declaration specifies the type of its components and the
number of elements.
44
Arrays
45
Arrays
char top( void )
// pre - the stack is not empty
// post - the top of stack item has been returned. The state of the stack is unchanged
{ ... }
bool empty( void )
// post - if the stack is empty, true is returned, else false is returned
{ ... }
void makeempty( void )
// post - the stack is empty
// abracadabra reversed = arbadacarba
Note that the code in function main never accesses the array stack directly. All operations
are carried out only via the provided routines makeempty, push, pop, top, empty. This is an
example of data abstraction - the stack data structure is protected from corruption by
requiring all accesses to be made through these functions. In the example, this discipline is
not enforced - it is possible for the stack to be accessed directly since stack is a global
variable that has file scope. We shall see later how direct access can be prevented, and
how the stack can be encapsulated in a single entity that holds both the array and the
variable that records the top of stack.
46
Program Files
Program Files
1. Introduction
The unit of compilation in C++ is the file. A program can be built from several files. These
will comprise:-
! The main program file that includes a function main
! Zero or more ‘modules’ providing support functions, data types etc. comprising
! A header file ( .h ) that contains prototype declarations for the functions
provided by the module and possibly type and data declarations.
! A source ( .cpp ) file containing the definition of the functions, types and
variables provided by the module. This file may or may not be present.
! The object file ( .obj ) created by compiling the .cpp file (see above) that
provides the definition of the functions whose declarations appear in the header.
The main program file contains compiler directives to #include the header file(s) for the
supporting modules. This ensures that functions and variables, constants and types defined
in the supporting source files can be accessed by the main program. In other words, the
header files provide the prototypes for functions and referencing declarations for variables
etc. that allow the compiler to generate code for the main program without the source of
the supporting .cpp files themselves being present at compile time.
At link time, the programmer must indicate which supporting object ( .obj ) files he wants
to be linked with the object code of the main program. Within the GNU C++ IDE this is
done by creating a project which defines all the required source files for a particular
project and ensures that the object code of each is up to date before the linker links them
all in to produce the executable. The project definition itself is saved as a .gpr file which
can be opened and changed as required. By default, the name of the executable file will be
the name of the project file. Thus assign1.gpr (the project file) will cause the executable
resulting from linking all object files to be named assign1.exe regardless of the name of
the main source program file. The default can be changed by the menu item Project.main
targetname.
Take iostream as an example. You must include the compiler directive
#include<iostream> to ensure that the actual text of this header file is included in the
compilation of your main program. Without this, the compiler would not be able to make
sense of a call to e.g. cin.get(). You do not need the source of iostream (iostream.cpp) and
it is not even present on the machine. At link time, the linker sees the header declaration
and knows from this that the object file for iostream must be combined with the object
code generated from the source of your main program in order to produce the executable.
The integrated environment allows the location of the object code of iostream to be
specified and the linker fetches it from that directory for inclusion.
Thus we have the concept of separate program modules that consist of two parts:-
! an interface part - the header file iostream
! an implementation part - the object file iostream.obj. (In fact, you will not find
iostream.obj in the directory because the code is included in the library files in the
lib directory).
The interface part defines the services provided by the module in terms of the functions,
variables, constants and types that are provided (exported) by the module. The
implementation part provides the actual implementation in the form of object code that is
needed at link time.
This is another example of abstraction. We need to know how to call the iostream
functions, and it is convenient that objects like cin and cout are pre-declared. For this
47
Program Files
reason, prototypes of the functions and the declaration of the standard I/O streams are
made available to us in the header file iostream, but the implementation is hidden in the
library files since we need not be concerned with how the functions are implemented nor
how stream objects are represented. Consequently we can access the resources provided by
iostream only via the routines and declarations provided in the header file (the interface).
We cannot access the representation of streams because it is hidden and is therefore
protected from the possible corruption that might have occurred had we been allowed
direct access to it.
Note that the ANSI C++ standard specifies that system header files such as iostream,
string, vector etc. should not be given with a .h file name extension. However, all other
modules (including those that you write) must have the extension .h. The GNU C++
compiler meets this requirement of the standard, but other, older, compilers may not and,
in those cases you will have to use the old name for such system headers, e.g. iostream.h.
48
Program Files
! type
This is important because it determines the amount of memory that is allocated for
the representation of the object and also its bit pattern. Thus both the number of
bytes and the pattern of the bits stored in those bytes will be completely different
between e.g. an int and a float even if they appear to hold the same value.
! storage class
This is important because it determines the lifetime of the object, i.e. how long it
remains in existence occupying storage. Storage class has defaults which are
determined by the position in the source code of the object's declaration. This may
be varied by providing an explicit storage class on declaration. There are 3
categories of lifetime -
! local (auto) lifetime is transient and exists only for the lifetime of the
enclosing block (usually a function, but see later).
! static lifetime exists for the duration of the program's execution
! dynamic allocated dynamically during a program's execution. lifetime is
for the duration of the program, or until de-allocation whichever
is sooner. This will be dealt with later.
! scope
This is the portion of the source code within which the object is visible. Thus a
variable declared within a function is visible (in scope) only within the block of
statements that constitute the function body regardless of its storage class. See also
Skansholm Chapter 4.3 Declaration, scope and visibility.
There can be different combinations of scope and storage class, e.g. a function local
variable can be declared static. The effect is that its visibility (scope) remains limited to
the enclosing block (i.e. the function body) but its lifetime continues for the duration of the
program's execution.
4. Local duration
Unlike some programming languages (e.g. Pascal and Modula-2), the body of a function
may not include the definition of another function. In other words, functions may not be
nested in C++ and the only valid definitions appearing within a function are those for data
items. Variables defined in a function have the default storage class auto and the formal
arguments to the function are also treated as auto.
The body of a function is a sequence of declarations and statements surrounded by braces
{}. This construct is known as a compound statement or block. Within a function body,
any statement may itself be a block. It is logical therefore that such a block, nested within
a function body, should be allowed to contain data declarations, and that the scope of those
declarations should be the surrounding block as with function local variables. Therefore
the sequence of statements that depend on the truth or otherwise of the logical expression
in an if statement may be a block that contains declarations whose scope is limited to that
block. A block may even consist of just the braces surrounding one or more statements :-
49
Program Files
Function swapifless above could have included a local variable definition int temp
(declared before the if statement). This outer temp would have been invisible within the if
block because the inner temp would have caused a 'hole' in its scope. This hole would
extend for the scope of the if block only.
A local variable can, of course, be initialised on definition. This initialisation can be by
any expression that is valid at that point, for instance by an expression that contains
reference to the formal arguments as above. In the absence of any initialisation, the value
of a local auto variable is undefined.
50
Program Files
6. Static duration
An external referencing declaration for a function is no different in form from the
function prototypes with which you are already familiar. It informs the compiler that a
function is to be called from a separate file from that in which it is defined. An external
referencing declaration for a function is made in the source program file in which the call
to the function is to be made, i.e. in the file in which it is not defined. The format is as
follows: -
external void print( void ); // declares a function that is defined in another file
// external may be omitted
External referencing declarations are usually made by placing in the main program file a
compiler directive to #include a header file that provides the necessary external
referencing declarations as explained in paragraph 1.
Variables declared outside of any function - e.g. before function main have file scope and
are referred to as global variables. The C++ compiler guarantees to initialise any global
variables to zero, but it is considered good practice to initialise them explicitly. As with
any data declaration, using the same identifier as another object declared in a surrounding
block, a local variable causes a hole in the scope of the global variable with the same name
- see the example below:-
#include<iostream>
int sum;
int main( void )
{
void subroutine( void ); // prototype declaration
sum = 15;
subroutine();
cout << "Global sum is " << sum << endl;
return 0;
}
51
Program Files
52
Program Files
Statements1 and statements2 are actual C++ program statements. The sequence #if
DEBUG, #else, #endif can be scattered throughout the source code and will have the effect
of including statements1 into the compilation if DEBUG is true, and including statements2
if DEBUG is false.
53
Program Files
In order to eliminate the debugging statements, it is only necessary to change the value of
DEBUG from true to false (0), and re-compile and link. The GNU C++ IDE allows macro
constant definitions to be changed via the menu item:-
Options.Compiler options
To define a macro named DEBUG, go to this menu item and enter -DDEBUG. To
undefine it, enter -UDEBUG.
A file macro.cpp is installed in the labs for you to try this out.
The conditional compilation facility may also be used to generate different versions of a
program for different platforms or conditions.
The standard include directories are stored in a directory indicated by operating system
path directives that are set up when the system starts or that are indicated by values that
can be configured from within the IDE.
When developing programs that consist of several modules (files) it is normal to supply a
header file for each module other than the main module. The main module then requires
compiler directives to #include these header files, using the form #include "filename.h". If
necessary, the header file may also be included in the compilation of the .cpp file for
which it is the header. In cases where header files themselves contain include directives,
there is the likelihood that some declarations will be included twice. In those cases, header
file inclusion may be made conditional on the existence or otherwise of a definition
Initially, you will not be writing programs whose complexity requires the use of #ifndef
and #define so do not worry about them unduly. When the linker complains that you have
multiple definitions of a function or variable, you will know that you have hit the problem.
Then seek advice.
54
Data Structures
Data Structures
1. Data Types
Data types can be described in terms of the range of values they may hold and by the
operations provided for them. e.g. type int has a range of possible values from
-2,147,483,648 to 2,147,483,647, and the provided operations include +, -, *, /, %, ++, +=,
>, <=, ==, !=.
We have not dealt in any detail with the way in which type int is represented in memory
because we do not need to know this in order to use the type.
We defined a type Clock to have a range of values representing the times from midnight to
23:59 at intervals of 1 minute. We also provided a small set of operations - gettime, tick
and show.
We try to follow the principle that the definition of such data types provides all the
information another programmer needs in order to use them in his program, but that the
representation should be hidden so that it cannot be corrupted. Another reason for hiding
the implementation is that it should be possible to change it, e.g. to improve performance.
The client program will have to be re-linked with the object code of the new
implementation but, provided that the definition is unaltered, no change should be required
to the source code of the client program.
3. Classification
There are two main groups - single entities of which there may be many instances
e.g.Clock, and collections (or containers) of many objects of the same type e.g. Set, List
etc. The components of these collections may be of any type, but, within one collection,
must all be of the same type. Frequently, part of the definition of a collection is the
relationship between the members.
55
Data Structures
4. Categories of Collection
The broad categories are:-
Hierarchical (Tree)
5. Stacks Graph
Definition
int funa ( int y )
This is the simplest of the linear collection types since the {
number of operations is typically small. As with all containers, return ( y * 2 ) ;
}
the components may be of any type, but must be of the same int funb ( int z )
type within any one stack. Additions to, and removals from the {
stack are made at one end only - the top. Access to components return ( funa ( z ) / 2 );
is limited to the item currently at the top. The consequence of }
this relationship between members is that the first item to be int func( int a )
{
added is the last to be removed. This is known as a LIFO return ( funb( a ) );
structure - last in, first out. }
int main (void )
Stacks are very widely used in Computer Science. When a {
function is called, a stack frame is built containing the address int x = 4, y;
to which control must return when the function has finished y = func( x );
execution. In addition, space is reserved in the stack frame for }
any auto local variables and for the values of any actual
arguments passed to the function. This structure
is pushed onto the system stack. When the
funa
function terminates, the stack frame is popped
from the stack, causing the arguments and local
funb funb funb
variables to perish. Another application is
recording the path taken through a structure so
func func func func func
that it can be retraced - the 'Hansel & Gretel'
effect.
main main main main main main main
Viewed as an abstract type, a stack cannot be full, but the actual implementation may have
to place a limit on the number of items that can be held on the stack. This gives rise to a
further operation
full test if the stack is full
Operations on abstract data types can typically be categorised into those that:-
57
Data Structures
Representation
The obvious first choice for representing a stack is an array, although this has the
disadvantage that an upper limit for the number of items to be stored must be chosen
before compiling, and this cannot be varied at run-time. This representation should be
hidden from a user of the stack by specifying the storage class static
// intstack.cpp
// representation and implementation of a stack of integers
#include "intstack.h"
const int MAX_STACK = 10; // the maximum number of items that can be stored
static int data[MAX_STACK]; // the container for the stack members
static int Top; // the index of the top item.
// Top will need to be initialised on startup, incremented
// before pushing a new member, and decremented after
// popping a member.
// When Top = MAX_STACK - 1, the stack is full
Implementation of the operations
This is left as an exercise. The full definition of the functions would be placed after the
global data definitions in intstack.cpp. Note that intstack.cpp contains an include compiler
directive for the header file. intstack.cpp would contain only the data declarations shown
above and the function definitions. There must be no function main.
Using the stack
A client program wishing to use the integer stack would import the definition (i.e. #include
"intstack.h") and then carry out operations on it as though it had been defined in the same
file. Because of the static qualifiers used for the array definition data and the integer
variable Top, the client program cannot access the representation directly even if extern
declarations are made for these two items in the client's source code. const MAX_STACK
also cannot be accessed because of its const qualifier.
#include <iostream>
#include "intstack.h"
int main( void )
{ // push some items
cout << endl << endl;
while( !full())
{
static int item = 0;
push( ++item );
cout << "pushing " << item << endl;
}
Now an attempt to access the stack variables directly - causes linker errors:-
Top = -1; // Linker error undefined symbol _Top - defined as static
cout << "MAX_STACK = " // Linker error undefined symbol
<< MAX_STACK << endl; // MAX_STACK is const in intstack.cpp
// pop them
while ( !empty() )
{
cout << "popping " << top() << endl; pop();
}
58
Data Structures
7. Queues
A queue follows closely the real-world example. Operations are permitted at both 'ends'
with additions (enqueue or append) being made at the tail and removals (serve or remove)
being taken from the head. Effectively, the elements are ordered physically according to
the time of their arrival. It is known as a FIFO structure - first in, first out. Typical
operations are:-
Implementation
Again, an array implementation is considered. We need two integers to indicate the head
and tail of the queue and possibly a further integer to record the size (although this can be
computed from head and tail).
const int MAX_QUEUE = 10;
static char queueitems[ MAX_QUEUE ]; // A queue of characters
static int head = 0, tail = -1, count = 0;
Initially, the indicator (technically cursor) tail is set to a special value to indicate the
empty state. The head of the queue can be viewed as being at the 'left hand' or 'bottom' of
the array, while the tail grows 'right' or 'up' the array as items are appended.
59
Data Structures
0 1 2 3 4 5 6 7 8 9
head
1. Empty
tail
0 1 2 3 4 5 6 7 8 9
head
2. append('A') A
tail
0 1 2 3 4 5 6 7 8 9
head
3. append('B') A B
tail
0 1 2 3 4 5 6 7 8 9
head
4. ch = serve() A B
tail
0 1 2 3 4 5 6 7 8 9
head
5. append('C') A B C
tail
The problem with this method of handling the array is that as items are appended and
served, the queue moves up the array, and will eventually bump up against the end when,
in fact, there may be space available lower down caused by elements being removed from
the head e.g. ‘A’ in this case. One solution is to slide all items in the queue down the array
once the tail has reached the top, but data moves are relatively expensive - particularly if
the queue elements are large.
A satisfactory solution is to view the array as
circular so that the first element follows on
count = 6 0 tail
immediately after the last. Spare space in the 9
N O 1
array caused by removals will always be
M
available for use as long as the number of 8
60
Data Structures
void enqueue( char element )
{
tail = (tail + 1) % MAX_QUEUE;
queueitems[tail] = element;
count++;
}
The simplest way of implementing the test for full and empty is to maintain the size of the
queue in a variable (e.g. count) within the queue module.
As with all data structures based on an array, the storage space is fixed at compile time and
the number of items that can therefore be stored is bounded. This inflexibility means that
arrays can only be used in cases where the maximum number of components can be
determined in advance.
8. Lists
Basically a list is a sequence of elements, each element other than the first and the last
having a predecessor and a successor. Another way of expressing this is that a list is
! either empty or
! consists of an element followed by a list.
This is known as a recursive definition.
The elements may be ordered:-
! by their time of arrival, i.e. each successive addition is placed after the previous last,
or
! inversely by their time of arrival - each element is inserted before the previous in a
similar way to a stack, although access may be allowed to any element.
! by some quality of the data e.g. a list of names ordered alphabetically.
! by requesting insertion at the 'current' position as indicated by some cursor.
Again, an array is considered as the method of representation. However, we find that there
is a high cost involved where insertion and deletion is permitted other than at the ends.
Each insertion within the list will require all elements following it to be moved ‘up’ the
array to make room, and, since there can be no ‘null’ elements, each deletion will require
all following elements to be moved down to close the gap. The time required to carry out
these moves makes this method of representation less than optimal. There are more
efficient and flexible ways of implementing lists in cases where insertions and deletions
are permitted within the list.
9. Structs
Frequently there is a need to store information about an entity under a single name where
the information describing that entity involves different data types. The struct is an
aggregate type that provides this facility:-
struct student // student is a type, not a variable.
{
char name[30];
int age;
char coursecode[6];
}; // note the semi-colon
student courserep; // courserep is one student
61
Data Structures
Each separate data item within the structure is referred to as a data member. Once the new
type student has been declared, a collection with that component type can be defined.
student aclass[16]; // aclass is an array of 16 students
Access to the members of a struct is by dot notation:-
strcpy( courserep.name, “William Brown” ); // simple assignment not allowed
courserep.age = 21;
strcpy( courserep.coursecode, “mit96” );
cout << courserep.name << endl << courserep.age << endl
<< courserep.coursecode << endl;
A queue of students could be declared as:-
const int MAX_QUEUE = 16;
static student stuqueue[ MAX_QUEUE ]; // A queue of students
static int head = 0, tail = -1, count = 0;
10. Unions
This is similar to the struct in that it can hold one or more items of different types. It
differs from struct in that it can hold only one of its components at any one time. The
compiler allocates storage for the largest of the specified members and all members are
overlaid onto the same storage. In other programming languages this type is usually
known as a variant record. There are two main uses for unions.
! In cases where different instances of the same entity may have different
characteristics, i.e. they are described by a different set of variables. This might
arise in a collection of students where part-time students require a record of their
employer whereas full-time students do not.
! In low level programming when a location in memory may be viewed as two
different sets of data, e.g. either two separate integer values or a long integer.
Example:
typedef short TwoInts[2];
union cheat
{
Twoints twoints;
long along;
};
cheat x;
x.twoints[0] = 255;
x.twoints[1] = 1;
cout << x.along << endl; 65791
62
Dynamic Data Structures
struct type-name
{
list-of-members
};
This is a type definition and does not allocate storage. It introduces a new type that can be
used subsequently in definitions of variables whose type is type-name.
Examples:-
These examples illustrate several things about the data type struct.
! The members (referred to as fields in other languages) may be of the same type, or
of different types.
! There is no limit to the number of members, but large records can be built up from
other struct types, for instance, type Person has a field birthdate which is itself a
struct type (Date).
! The members may be of any type, including arrays (and other structs)
! The type name can be used in declarations of arrays whose elements are of struct
type, e.g. mscit is an array of 40 elements, each of whose data type is Student. Each
Student has a data member called personaldata of type Person; a tutorGrp of type
char; and an array of 9 elements of type int called modulemarks.
! The type-name appearing after the reserved word struct is known as the structure
tag. It is desirable that this name (e.g. Date, Person, Student) be unique within its
own scope.
As you can see, structures can be used in combination with other structures and with
arrays to create arbitrarily complex types capable of modelling many real-world entities.
63
Dynamic Data Structures
! Access to components
Elements of an array can be accessed by subscripting the array name as in the
example above. The subscript can be a variable that is modified within a loop c.f.
the Plane example. This allows computed random access to any array component.
The members of a struct, on the other hand, are accessed using dot notation i.e. the
structure variable name followed by a dot followed by the member name. The dot is
known as the structure member operator. If the member name is itself a structure
and access is required to its members, then further dots are required to tunnel down
through the member hierarchy, viz.
64
Dynamic Data Structures
Fred.name;
Fred.birthdate.day;
mscit[10].personaldata.birthdate.year;
mscit[20].personaldata.address[1];
mscit[30].marks[2]; // the marks of student number 30 for the
// second module
! Pointers to structures
If a structure is referenced by a pointer then the de-referencing operator applied to
the pointer provides the access:-
Date* dptr = today; // dptr is a pointer to Date and points to the Date today
Date dt = *dptr; // dt is assigned the value of today by dereferencing the
// pointer dptr
However, the structure member operator (dot) has a higher precedence than the
dereferencing operator (*). So access to a member of today via the pointer dptr must
use parentheses to resolve the precedence:-
cout << (*dptr).year; // displays the year member of today via the
// pointer dptr
This type of access is frequently required and the syntax is rather clumsy. A new
operator is introduced for this purpose - the structure pointer operator ->. This does
two things - dereferences the pointer to access the whole structure, and then
accesses the member given after the operator (year in this example).
cout << dptr->year;
! Initialisation
As with arrays, structures may be initialised at the time they are defined, e.g.
Date his_birthday = { 1995, 11, 15 };
3. Storage Management
So far we have only been able to use data items that have been defined at compile-time.
Thus, an array defined in the source code of a program as:-
int table[100];
Will hold 100 integers and, if the requirements of the program exceed this number of
elements, then the excess cannot be handled. Clearly this is unsatisfactory. The
programmer cannot predict the demands that will be made on his program when it is being
used by a client. What may have seemed a generous estimate when the program was
written might soon turn out in practice to be a ludicrous under-estimate. What is more, if
the estimate is indeed generous, then a large amount of storage space remains unused and
therefore wasted because it cannot be used temporarily by other data items.
An example is a windowing system like MS Windows. The programmers of Windows
could not possibly have worked on the assumption that the number of open windows
should never exceed a certain fixed limit. Since that code was written, the memory
installed in the average PC has at least doubled, redoubled, and redoubled again. To have
fixed this limit 3 or 4 years ago would have put all users in a straight jacket which would
now appear intolerable.
So how can we create and delete data items dynamically at run-time in response to the
demands of the application program?
By using the memory allocation and deletion procedures new and delete. The use of these
routines is closely bound up with pointers and equivalent facilities are to be found in most
of the conventional programming languages such as Ada, Pascal, Modula-2 and C.
65
Dynamic Data Structures
3.1 new
The syntax is new type-name [number-of-elements], where [number-of-elements] is
optional and is used when a dynamically allocated array is required.
Examples int* intptr = new int;
char* chptr = new char[20];
The first statement allocates from the heap a chunk of memory sufficient to hold
one integer and sets the pointer to integer intptr to point to this memory location.
The heap, or free store is the name given to that part of available random access
memory that is not currently occupied by program code and ordinary program
variables.
The second statement allocates sufficient memory from the heap to accommodate
an array of 20 characters and sets chptr to point to the first.
In the assignment and output statements, intptr needs de-referencing to produce the
value of the integer to which it points. chptr, on the other hand, does not require de-
referencing since we want the whole array to be assigned or output rather than just
the single character to which chptr points. This treatment is analogous to that of an
array name.
3.3 delete
the delete operator has two forms, without brackets for single data items, and with
brackets for arrays. Note that, whereas the form of new required the brackets to be
placed after the type name:-
char* chptr = new char[20];
3.4 Lifetime
The lifetime of objects allocated by new is from allocation to the earlier of de-
allocation (via delete) or termination of the program.
Notice that lifetime may be different from scope. If a pointer providing access to a
dynamically allocated item goes out of scope (perhaps because it is a local function
66
Dynamic Data Structures
variable and the function terminates) then the dynamic data item continues to exist,
but is inaccessible. This is known as memory leakage. If it happens often enough,
the program could run out of memory even though not all is being used. Local
function variables can be used for allocating dynamic data items, but it is necessary
to ensure that, before the function terminates, some other pointer that will continue
in scope is set to point to it.
int* makenewtable( int size )
{
int* intptr = new table[size];
return intptr;
}
Since the function returns a pointer to integer, the result of the function call will be
assigned to some other pointer to integer and access will not be lost by the demise
of inptr:-
int* newtable;
newtable = makenewtable( 20 );
If there is insufficient memory available on the heap when new is called, new
returns the special pointer value 0. This means that the pointer does not point to
anything and that, in this case, the allocation has failed. When building dynamic
data structures, 0 is frequently used as a pointer value to indicate that no link exists
between components of the structure.
int* intptr = new table[ size ];
if ( intptr == 0 )
{
cout << "Error, insufficient memory " << endl;
exit(1);
}
The size argument to new permits the size of a dynamically allocated array to be
determined at runtime. This can be used to get over the fixed size problem of arrays.
The array is allocated on start-up with. say 10 elements. When it becomes full,
makenewtable is called with an argument of, say, double this (i.e. 20). The contents
of the original array are copied into the newly allocated one, and the old array then
deleted. Next time the array becomes full, makenewtable is called with an argument
of 40, and the copying done again. In this way, the effect of a dynamically
resizeable array can be obtained. However, during this doubling process, there is a
temporary requirement for additional memory that might cause memory exhaustion.
Also, the requirement that the old data be copied into the newly allocated table is
relatively costly in terms of time, and it is therefore advisable to minimise the
number of resizing operations wherever possible - this is the reason for doubling the
size on each resize.
67
Dynamic Data Structures
Each node therefore consists of a data field (in this case an integer) and a pointer to the
next node. The list itself can be implemented as a structure containing links to the first and
last nodes in the list, and a count of the number of nodes. These links are, again, of type
pointer to node. If the list is empty, then the links to the first and last nodes are given the
special value 0 referred to above. The same principle will be applied to the link member of
the last node in the list since it will have no successor:-
struct LinkList
{
int count;
Node* first, * last;
}
The operations for a list are much less closely prescribed than those for stacks and queues
since it is a more general structure and access may be provided at any point. There are also
several possibilities for the ordering of the nodes. For simplicity therefore, the example
shown below will add new items to the end of the list, and remove items from the front.
This is therefore, in effect, a queue.
68
Dynamic Data Structures
69
Dynamic Data Structures
LinkList
last
first
1 2
Node Node
Heap
n->data = item;
3
n->link = 0;
a) t.last->link = n;
b) t.last = n;
c) t.count++;
LinkList
last
first b)
2 3 c)
a)
1 2 3
70
Dynamic Data Structures
LinkList
last
first
c)
3 2 d)
a) 1 2 3
b)
e)
tempnode
Heap
d) t.count--;
e) delete tempnode
LinkList
last
first
2 3
Node Node
71
Dynamic Data Structures
72
Sorting
Sorting
1. Introduction
There are two main types of sorting - sorting arrays held in random access memory, and
sorting files. In the early period of computing, file sorting tended to be dominant because
RAM was very expensive and mass storage was held on magnetic tape, access to which is
sequential. In contrast, magnetic disk storage provides the possibility of accessing file
records by reference to their position in the file.
2. Components of Sorting
Sorting involves rearranging the elements so that they are in order. This, in turn consists of
two operations:-
! Comparing elements - usually by reference to a key field
! Moving elements - usually by swapping pairs of elements
There are normally many more comparisons than moves and the number of comparisons
will be the most significant operation in terms of time, and therefore the prime indicator of
the efficiency of a sorting algorithm.
3. Sorting Files
Database systems are now universal, and file sorting has become less important. Instead, a
number of different indexes are held - either within the data file, or as separate files - that
allow the data file to be read (and output) in different orderings.
If the amount of RAM permits it, and indexes are not supported, then the fastest way of
sorting a file is to read it into an array, sort the array and write the data back out to file. If
the file is too big, then it can be broken up into chunks, each of which is sorted in an array
and written out to a separate file. Then the several ordered files are merged back into a
single file.
The traditional file merge requires only 2 elements of the file to be in memory at any one
time and works as follows:-
! split the original file into two new files writing 1 item to each new file alternately.
Then merge back into the original file in pairs, creating n 2 runs of 2 items per run
! split the original file into 2 writing 2 items to each file alternately. Then merge back
into the original file in quadruples creating n 4 runs of 4 items per run
! split the original file into 2 writing 4 items to each file alternately. Then merge back
into the original file in octuples creating n 8 runs of 8 items per run
! etc.
The sort has finished when the original file contains 1 run of n items. The following is a
simplified example based on a file of 8 items. The principle is exactly the same for any
number of items.
73
Sorting
1 Original File 5 8 3 6 7 2 4 1
5 6 3 5
8 6 3 5 6
2.2 Run 2 2 1 3 5 6 8, 1
2 4 3 5 6 8, 1 2
7 4 3 5 6 8, 1 2 4
3
Split into 2 files consisting of 4 items from 3 5 6 8
the original alternately 1 2 4 7
Note that:-
! There are only 2 elements from the file present in memory at any one time
! The process is dominated by I/O time
! The number of passes required to sort the original file is log2n
n Passes
8 3
64 6
512 9
4,096 12
32,768 15
262,144 18
2,097,152 21
74
Sorting
4. Why sort?
! Sorting is used to optimise searching for and retrieving data either by humans or by
the computer
! To produce a report which, because it is sorted, simplifies the manual retrieval of
information
! To make more efficient searches for items held in either main memory or external
storage
7. Sorting efficiency
We are not usually concerned with the absolute amount of time required for a sort. But we
are concerned with how the time t taken for a sort varies with the number of items n
required to be sorted.
If there is a linear relationship, then t will vary directly with n. i.e. it will be O(n). But no
O(n) sort has yet been discovered!
75
Sorting
If t varies as a function of n2 then an increase in n by a factor of, say 10 will increase t 100
times and increasing n by 100 will increase t 10,000 times
The simple sorting algorithms are all O(n2)
k=n
While k > 1 Do
For each element i from 1 to k - 1 Do
If element i > element i +1 then
Swap element i with element i + 1
Endif
EndFor
Decrement k
EndWhile
Pass 1 2 3 4 5 6 7
K 8 7 6 5 4 3 2
44 44 12 12 12 12 6 6
55 12 42 42 18 6 12 12
12 42 44 18 6 18 18 18
42 55 18 6 42 42 42 42
94 18 6 44 44 44 44 44
18 6 55 55 55 55 55 55
6 67 67 67 67 67 67 67
67 94 94 94 94 94 94 94
Notice that after each pass, the heaviest element in the unsorted part of the array has
settled to the bottom, increasing the sorted portion by one and decreasing the unsorted
portion by one. The indicators of the efficiency of this algorithm are:-
Comparisons = (n-1) + (n-2) ... + 1 = 28 = ½(n2 - n)
3
Max moves = /2 (n2 - n) = 84 max
3 2
Ave moves = /4(n - n) = 42 ave
This algorithm can be improved by employing a flag that is set when no exchanges take
place on a pass. In this case the array is sorted and no further passes are required. This is
an O(n2) algorithm. It is never used in real application because it is the least efficient of all
sorting algorithms. It is introduced here because it is relatively easy to understand and so
that you will know never to use it!
76
Sorting
9. Insertion Sort
This works in a similar way to the sorting of a hand of cards
Pick up the last but one element and place it in the correct order in the last 2
Pick up the last but 2 and place in the correct order in the last 3 etc.
Pass 1 2 3 4 5 6 7
K 7 6 5 4 3 2 1
k'th key 6 18 94 42 12 55 44
44 44 44 44 44 44 44 6
55 55 55 55 55 55 6 12
12 12 12 12 12 6 12 18
42 42 42 42 6 12 18 42
94 94 94 6 18 18 42 44
18 18 6 18 42 42 55 55
6 6 18 67 67 67 67 67
67 67 67 94 94 94 94 94
77
Sorting
0
Insertion Selection Exchange
11. Conclusions
11.1 Insertion sort is better for small data items and large keys. It also gives good
performance when the data is already ordered (or nearly so). For this reason it is often
used in conjunction with advanced sorting algorithms, e.g. Quicksort
11.2 Exchange sort is the slowest sorting algorithm and is only used in teaching or trivial
applications because it is the simplest to code
11.3 Selection sort (not shown) is better for large data items with small keys. It has
shown slightly better performance than Insertion on inversely ordered data
78
Sorting
13. QuickSort
This was invented by C.A.R. Hoare - a famous Oxford professor of computing and is an
advanced algorithm, based on the exchange sort, that normally employs recursion. It is the
most efficient of the advanced sorts although it becomes inefficient under certain very
exceptional conditions. The more data items, the less likely these conditions are to arise.
Insertion sort is often used in conjunction with Quicksort to sort small partitions.
The technique is to split the array into two partitions and then to sort the first partition
followed by the second partition:-
The 'partition' portion of the algorithm is where all the work is done. the second and third
statements are simply recursive calls to the function itself.
The partitioning process ensures that all items in the first partition have values that are <=
all items in the second partition - although neither partition is necessarily sorted.
One of the keys in the partition currently under consideration is selected as the pivot (the
central element in this example)
The items in the current partition are scanned
! first from left to right looking for an element >= pivot
! then from right to left looking for an element <= pivot
! when each scan has stopped, and provided the scan indexes have not crossed over,
the two items are swapped.
79
Sorting
Pivot
44 55 12 42 94 6 18 67
Scan Scan
Swap
18 55 12 42 94 6 44 67
Scan Scan
Swap
18 6 12 42 94 55 44 67
Scanning continues until the 2 pointers cross over. The pivot is now in its correct position
in the array and is no longer involved in the partitioning. It may have been moved from its
original position.
Quicksort is called recursively to partition the lower and upper partitions, provided there
are at least 2 elements in them
14.3 Average
For all possible orderings of the keys 1.39n.log2n. Mathematicians can see the proof
in Algorithms - see para 17. below.
80
Sorting
16.2 Heapsort - a refinement of selection sort. It seems to like sequences which are
initially in inverse order. The second fastest of the advanced sorts. Shell sort is
faster only if the data is already ordered.
16.3 Quicksort - is significantly faster than either of the above whatever the initial
ordering of the data.
500
T 400
i 300
Ordered
m
e 200
Random
100 Inverse
81
Testing
Testing
1. The context for testing - Verification and Validation
Verification and Validation is a generic term for all processes which ensure that the
software meets its requirements, and that the specification meets the needs of the client. In
other words,
Verification means - Are we building the product right?
This involves checking that the software product conforms to its
specification
Techniques required
! Static - Analysis of the design and program listing.
Includes Walkthroughs, Inspections, Formal verification
! Dynamic - Exercising the program using test data similar to real data, i.e.
testing
Testing cannot prove the absence of defects, only their presence. A successful test is one
that discovers defects.
83
Testing
It is much more economical to discover errors at the design stage than after the program
has been coded because this avoids the correction process i.e. it avoids the need to debug
and re-test.
Sub-
Unit Module System Acceptance
System
Testing Testing Testing Testing
Testing
User
Component Testing Integration Testing Testing
Finally, all modules are combined to produce the program - system testing.
84
Testing
After this, the user carries out acceptance testing. For bespoke systems developed
for a single user, this is sometimes referred to as alpha testing. For marketable
software products beta testing may be used where a number of users agree to use
the system and to report on any problems. In exchange for this they may get the
software either free or at a preferential rate.
Advantages and Disadvantages of Bottom-up Testing
! Advantage
It is easier to create test conditions. The functionality is there - it just needs
code to test it.
! Disadvantages
" If combined with top-down development, all system components must
be available before testing can start because the last items to be
completed under this development strategy are the lowest level
components - the first to be tested.
" If top-down development is not employed, then special test drivers
have to be written for each component. Eventually these are replaced
by the actual higher level components when they are implemented.
4.3 Conclusion
The top-down approach is generally considered preferable for most systems today -
Yourdon. But, in practice, it will always be necessary to include a certain amount of
bottom up testing of low level components.
85
Testing
5. Categories of Testing
5.1 Functional testing
The most common form. Its purpose is to ensure that the program performs its
normal functions correctly - see above.
6. Test Planning
The planning of tests should be carried out during the Specification and Design phases of
the software project:-
System Sub-System
Acceptance
Service Integration Integration
test
test test
86
Testing
87
Testing
Example
A function requires an argument Age which is an integer. The allowable range of
values for Age accepted by the function is 18..65.
From a study of the specification of the function or other program documentation
the following 3 equivalence classes can be identified:-
! Valid class any value in range 18..65
! Invalid class any value in range MIN(int)..17
! Invalid class any value in range 66..MAX(int)
Test cases can then be designed for each valid equivalence class and for each
invalid equivalence class - a total of 3 tests in this simple case.
If there is more than one argument, the test cases should cover the invalid classes
for only one argument at a time because one erroneous argument may mask the
effect of another erroneous argument.
88
Testing
89
Testing
This complements equivalence partitioning and, in practice, is used at the same time
as equivalence partitioning to determine the test data required for testing a
component.
Boundary values are those
! directly on
! just below
! just above
the boundaries of the equivalence classes
It is an observed fact that a greater number of errors occur at the boundaries of the
input domain than in the centre.
Examples
! Range of values, e.g. 18..65
! Test 17,18,65 and 66
! Discrete set of values, e.g. 2, 3, 5, 8, 13
! Test 1, 2, 13, 14
! Data structure (e.g. array) has 1..100 elements
! Test 0, 1, 100, 101
! Loop iterations, none, 1, 2, max, max + 1
90
Testing
A B C Branching Decision
Statement Block
Loop twice
How many different sets of paths exist for this simple piece of code?
1 2 3 4 5 6 7 8 9
First iteration A A A B B B C C C
Second iteration A B C B A C C A B
91
Testing
But why do we need to go to all this trouble? Wouldn't we spend our time better simply
ensuring that the function/module/program requirements have been met? In other words
why don't we confine our tests to black box testing?
Because
! Logic errors and incorrect assumptions tend to occur in inverse proportion to the
probability that a path will be executed.
Normal processing tends to be well understood and scrutinised, but special cases
tend to fall down the cracks.
! We often believe that a path is unlikely to be executed when, in fact, it may be
executed regularly.
! Typing errors are usually picked up by the compiler. But those that are not detected
are just as likely to occur on an obscure logical path as on a mainstream path.
92
Testing
If
Repeat
exercised.
Flow chart for binary search
93
Testing
94
Testing
Example
if ( A > 1 && B == 0 )
X /= A;
For the above 2 conditions there are 4 test cases i.e. 22. For 3 conditions, there are 23
= 8 possible combinations etc. This technique is therefore only practicable for small
numbers of conditions.
= > <
A == 1 1 A>1 2 A<1 0
B == 0 0 B>0 1 B<0 -1
There are therefore 3 test cases for each of the two variables in the example
compound condition, leading to 32 = 9 test cases. Again, the number of test cases
rises rapidly as the number of variables involved in a relational expression
increases.
95
Testing
Simple loops
The following tests should be applied to simple loops, where n is the maximum
number of allowable iterations of the loop:-
! Skip (loop is not entered)
! One pass
! 2 passes
! m passes (m < n)
! n - 1, n, n + 1 passes
Nested loops
The number of times that statements within the inner loop are executed is the
product of the number of iterations of all nested loops within which it appears. Thus
a triply nested loop, where each loop iterates 10 times, will cause statements in the
inner loop to be executed 1,000 times. The number of test cases grows
geometrically and full testing may be impracticable. The suggested solution is:-
a) Start with the innermost loop, setting all outer loop control variables to their
minimum.
b) Test the inner loop as Simple above.
c) Work outwards to next innermost etc. keeping outer loop control variables at
their minimums, and the inner at typical values.
d) Continue until all nested loops have been tested.
Concatenated loops
Where the concatenated loops are independent of each other, treat each as a simple
loop.
Where the second loop has the same control variable as the first and starts with its
value unchanged, treat the two loops as nested.
96
Testing
! The programmer writes assertions about the state of program. The assertion
processor tests whether they are true or false. C incorporates a simple form of
assertion testing:-
#include <assert.h>
int main ( void )
{
int i = 0;
for( ; i <= 10; i++ );
assert( i == 10 );
return 0;
}
/* Assertion failed: i == 10, file ASSERT.CPP, line 7
Abnormal program termination */
C++ provides exception handling which gives greater flexibility and permits an
exception handler to attempt recovery from an error.
! Test file & Test data generators
! Test verifiers - measure and report on internal test coverage
! Test harnesses - Allow the program to be installed in a test environment, and fed
input data. The behaviour of subordinate modules is simulated by stubs.
! Output comparators - compare output from the current version of program with that
from an earlier version to determine any differences
This is an area of growing importance and descendants of the first generation testing
tools are expected to cause radical changes in the way software is tested.
97
Data Structure Metrics
Assume we wish to store a linear list of names in random access memory. There are
several ways this could be done.
Scheme 1
Names are stored in successive memory locations (each name is assumed Address Name
to occupy only 8 bytes). 1000 Milton
1008 Dickens
Given the start address of the list (1000), we can find the ith name by
1016 Eliot
going to address Start + (i - 1) * 8.
1024 Arnold
1032 Conrad
We can find the address of the next name by adding 8 to the address of Scheme 1
the current element.
Thus, Scheme 1 implements the logical structure of the data by locating its elements in
physically adjacent memory locations.
But if we wish to retrieve a name (in order to access some other data associated with it),
then we would have to scan the list from the start, looking for the name to be retrieved.
Scheme 2
Address Name
Each name is positioned in memory according to the value of its first 1000 Arnold
letter. The address for a particular name is found by 1008 -
1000 + 8 * (int(firstletter) - int(`A')) 1016 Conrad
In this case there is no way of finding the logical successor of a record. 1024 Dickens
We are prevented from operating on the data using its logical structure. 1032 Eliot
But if we wished to retrieve a particular name, we could do so very .. ..
quickly by calculating the address directly from the name. 1096 Milton
Scheme 3 Scheme 2
As with Scheme 1 we cannot find a given name other than by starting at the beginning of
the list and comparing each successive name with the target. These three schemes illustrate
the three fundamental methods of implementing abstract list data types - by an array, a
hash table and a linked list.
99
Data Structure Metrics
3. Metrics
One way of implementing a list is to use an array. It is true that arrays are relatively
unsuitable for this purpose because of their inflexibility and because of the need to shuffle
array elements down to fill the hole left by a deletion, but they have the advantage of
requiring no overhead in terms of space. Linked lists, of course, carry an overhead in the
form of the links (pointers) that connect the nodes.
Envisage then a list implemented as an array as in Scheme 1 above and assume that we
wish to find the name Eliot in the list.
3.1 Number of Comparisons
We simply start at the first name in the list (Milton) and search through the list,
comparing each name encountered with Eliot. One measure of the time required to
find this name is the number of comparisons made of each name with the target.
Unless the list is very short, the time required to initialise and finalise the search
will be relatively unimportant when set against the number of comparisons. It is
generally true that the number of comparisons made when searching a data structure
will be one of the major factors in determining the speed of execution.
100
Data Structure Metrics
4. Mathematical Notations
One way of ascertaining the efficiency of algorithms used in operations on data structures
is to write a program which tests the algorithm on a large number of different types and
sizes of data. This approach is useful in trying to understand an algorithm and the factors
which affect its efficiency, but the problem is that:-
a) The data would only be valid for the computer, operating system and language we
have employed and the nature of the data stored in the data structure.
b) We could not possibly examine exhaustively all possible combinations of data
(there are over 358,000 different combinations of just four characters, ignoring
case).
c) We would finish up with a mass of results which would be difficult to understand
and distil into a general indication of the efficiency of the algorithm under
consideration.
We require a crude indicator of the time complexity of an algorithm that relates the time
taken to the number of elements held in the data structure. We are not particularly
concerned with the absolute amount of time, which, for one algorithm, will depend on the
factors mentioned in a) above.
Looking at the search example above, how many comparisons, on average will be required
to find a name in the list? Let n denote the number of names in the list:-
To find the average number of comparisons necessary to locate a name present in the list,
we first find the total required to find each of the names, and then divide by n. Thus, n
comparisons would be needed to find the last name, n - 1 to find the last but one ... through
to just one comparison to find the first. We can calculate the average number of
comparisons for n items without needing to know the value of n:-
101
Data Structure Metrics
Since there are n items in the sequence, the total of the third row is n(n + 1). To find the
average number of comparisons, we need to divide by n and also by 2 since we added the
2 sequences together.
Divide by 2n to find the average for any one name n(n+1) ie ½(n + 1)
2n
Thus the average number of comparisons required to find a name in the list is about half n
whatever the value of n. Since we have seen that the number of comparisons is a major
determinant of the time required, we can say that the time taken for this search is
proportional to ½(n + 1). Since the constant ½ is not significant in relation to other
possible factors of n, we can say that the order of magnitude of the efficiency of the search
is n, and we write this as O(n). This is sometimes referred to as the Big O notation.
Only the dominant term is chosen to represent a crude notion of the order of magnitude of
the entire expression, eg
n(n+1) is O(n2)
15n logn + 0.1n2 + 5 is O(n2)
6 logn + 3n + 7 is O(1)
2n - 5
Why is the second item above classified as O(n2) when this appears to form only a small
part of the expression? Table 1 shows the value of this function for various values of n.
The last column shows the value of the expression divided by 0.1n2. Note that from n =
512, the value in this last column starts to settle down to about 1.0 indicating the
overwhelming importance of the 0.1n2 component.
2 2 2
n 15n log2n 0.1n 15n.log2n+0.1n +5 / 0.1n
O(n½) 3 11 32 1,024
O(log2n) 3 7 10 20
O(n.log2n) 24 896 10,240 20,971,520
In some sources you may find logarithms specified without the base, eg O(nlogn). Does it
matter which logarithm base in used in these order of magnitude expressions? The answer
is no, because, although the absolute values of the expressions will differ according to the
base used, the rate of increase of the function for increasing values of n will remain the
same for all logarithm bases.
Table 3 illustrates this by showing the values of the expression O(nlogn) for logarithms
base 2, e and 10 and for values of n which double in each row. Note that the rate of
increase is exactly the same for all three bases, and is approximately 2.2 times for each
doubling of n.
103
Trees
Trees
1. Applications
! Trees are hierarchical structures and can be used in any application that models a
hierarchical structure, e.g. disk directory and file structure.
! In some forms they can provide rapid searching and lookup
! They can maintain their data ordered (usually on a unique key that is associated
with their data)
2. Implementation
Trees cannot normally be based on a fixed size structure such as an array. They are
normally implemented using dynamically allocated nodes linked by pointers.
3. Variations
! Binary Search trees
! Expression Trees
! Balanced Trees
! N'ary Trees
! B Trees
4. Example Declaration
struct DataItem
{
int key; // key to search on
anytype value; // depends on the application
};
struct Node
{
DataItem data; // struct as above
Node* left, *right; // pointers to left and right child nodes
};
struct BinaryTree
{
int count; // number of nodes
Node* root; // single entry point into the tree
};
105
Trees
5. Expression Trees
#
Assume the expression ( 3 + 4 ) * ( 6 - 4 ) is to be
evaluated. Parsing and evaluating an infix expression
of this sort in a single pass is very difficult because
the string has to be searched back and forth to + -
recognise and allow for the modifying effect that the
parentheses have on the meaning of the expression.
A tree of nodes representing operators ( +, -, *, / )
and values (or variables) can be built to represent the 3 4 6 4
semantics of the expression without the parentheses.
The tree can then be traversed to retrieve the
symbols and values in an appropriate order for
evaluation - see Traversal below.
6. Tree Traversal
There are several possible ways in which the tree can be traversed, the most common are
known as inorder, postorder and preorder:-
The post order traversal would produce the nodes in an order suitable for evaluating the
resultant postfix expression using a stack.
The algorithm for binary tree traversal is one of the most elegant in computer science. It is
recursive:-
void inorderTraverse( Node* p )
{
if ( p != 0 )
{
inorderTraverse( p->left );
Process( p-> data );
inorderTraverse( p->right );
}
}
Process( p-> data ) is the operation that is to be carried out on each node. Note that this
algorithm effectively maintains its own stack of nodes visited but not yet processed. This
is represented by the series of stack frames that is pushed onto the system stack for each
call to the function. A non-recursive version of this algorithm requires an explicit stack of
nodes to be maintained and is quite inelegant when compared to the above.
106
Trees
7. Parse Trees
Sentence = Subject Verb Object Sentence
Subject = Noun | Noun Phrase
Object = Noun | Noun Phrase
Subject Verb Object
Noun = Cat | Mat | Dog OR OR
Level
The total number of nodes in a perfectly balanced binary search tree is 2 -1. Thus, for
20 levels, the total number of nodes would be 1,048,575.
The efficiency of a perfectly balanced tree is measured by the average number of
comparisons required to find a key that is present in the tree. Since it requires one
comparison to visit the root node, two comparisons to examine the root node and one of its
child nodes etc. the maximum number of comparisons is the number of levels and, since
the number of nodes doubles at each level, the average number of comparisons for a
perfectly balanced tree is the number of levels - 1. Thus for a perfectly balanced tree of
1,048,000 nodes, the average number of comparisons is Number of Levels - 1 = 19.
107
Trees
This makes binary search trees a suitable structure for fast retrieval of data by reference to
a key and, for this reason, the C++ Standard Template Library uses balanced binary search
trees to implement searchable structures such as map and set.
9. Importance of Balance
This tree was generated by inserting the data in 2
numeric order - 2, 4, 6 .. 16. If, as in this case,
4
the tree is not balanced, search efficiency
degrades towards a simple sequential search, 6
i.e. from an average number of comparisons = 8
Level - 1 to 10
½(n + 1). There is little difference between the
12
two in this small example but, for large
numbers of items, the difference in searching 14
efficiency is extremely large. 16
Degenerate Binary Search Tree
AVL Trees (from Adelson-Velskii & Landis)
employ a balancing algorithm on every insertion and deletion which ensures that the tree
maintains an adequate (although not perfect) balance. Another algorithm is red/black trees
that are used in the Standard Template Library.
108
Trees
109
Hash Tables
Hash Tables
1. Applications
! Compilers (see later under perfect hashing functions)
! Basis for other Abstract Data Types, e.g. Set, Dictionary
! Very efficient retrieval
2. Operations
! Insert
! Remove
! Find (Lookup)
3. Efficiency
The measure of efficiency of searching and sorting is given using the big O notation (see
Data Structure Metrics on page 99). This is a very crude measure of the relationship
between time and the number of items being dealt with. The important factor is the rate at
which time increases as the number of items increases. Hash tables are unique among data
structures in that their efficiency is not dependent on the number of items stored and their
efficiency is therefore given as O(1).
4. Problem
The penalty paid for this exceptional measure of efficiency is that hashing destroys the
lexical order of keys, so that they cannot subsequently be retrieved in their lexical order.
5. Hashing
Data is stored in a Hash Table that is based on the fundamental array structure provided by
the language. The size of the table is always a prime number. Insertion (and searching) is
performed by applying some function to the key which converts it into an integer in the
range 0 .. table_size -1. The modulus operation is used to achieve wrap-around. In this
example the column headed ASC represents the sum of the ASCII codes of the first 3
characters of the name. This is then taken modulo 11 (the table size) to produce the table
index. The insertion of the first three items is
shown in the hash table (second of the two Name Key ASC Table Index
tables). The fourth key BYR produces the same SHELLEY SHE 224 4
index as that of WORDSWORTH - a collision. WORDSWORTH WOR 248 6
This is not surprising since we are trying to KEATS KEA 209 0
insert a very large domain of values into a table
BYRON BYR 237 6
with only 11 locations.
BLAKE BLA 207 9
BETJEMAN BET 219 10
111
Hash Tables
6. Collision Resolution
There are two strategies for resolving collisions:-
! Open Addressing Key Data
A second hashing function is used to give a 0 KEA KEATS
new table location and a further attempt is 1
made to enter the key into the table. The 2
simplest function to produce a new location 3
after a collision is to successively add 1 to the 4 SHE SHELLEY
5
result of hashing the key. But this can cause
6 WOR WORDSWORTH
clustering where the relative density of certain 7
areas of the table is higher than average. This 8
can give rise to a higher than necessary 9
number of collisions. An improved second 10
hashing function is:-
hashvalue = hashvalue + step
where step = hashvalue % ( table size - 2) + 1
step is computed only once before the loop is entered.
Probing continues until an empty slot is found or, after a certain number of tries, the
table is deemed to be full.
! Chaining 0 KEA KEATS
1
The Table entry contains a data 2
entry and a pointer to the head of a 3
list of data items that collided with 4 SHE SHELLEY
the first or, more simply, just a 5
6 WOR WORDSWORTH BYR BYRON
pointer to the head of a list.
7
#include "strng.h"
#include <assert.h>
struct Item // component type of the table
{
String Key, Data;
bool occupied;
};
const TABLESIZE = 167; // 167 is prime
Item tabl[TABLESIZE]; // Hash table is an array of Item
int itemcount; // number of items stored
void init(void )
{
for ( int i = 0; i < TABLESIZE; i++ )
tabl[i].occupied = false;
theSize = TABLESIZE;
itemcount = 0;
112
Hash Tables
}
void add( const String& key, const String& data )
{
// for best efficiency, the number of occupied slots should be <=
// 80% of table size
if ( itemcount > theSize * 8 / 10 )
{ resize( ); }
int hash = key.hashvalue(); // key must support a hashvalue function
int step = hash % (theSize - 2) + 1; // step size for collision resolution
hash %= theSize; // hash mod table size
int numprobes = 1; // to count the number of probes
// look for an unoccupied slot
bool foundslot = ( !tbl[hash].occupied );
// loop not entered if unoccupied slot found first time
while( !foundslot && (numprobes < theSize) ) // second cond is belt & braces
{
hash = ( hash + step ) % theSize;
foundslot = ( !tbl[hash].occupied );
numprobes++;
}
assert( foundslot ); // should always be true
tbl[hash].Key = key; // store the key
tbl[hash].Data = data; // and the associated data
tbl[hash].occupied = true; // slot is now occupied
itemcount++; // increment count of items
}
113
Hash Tables
114
Libraries
Libraries
1. The ctype library
This is a 'C' library of functions that operate on characters. They include functions to test
whether a char is a letter, a digit, punctuation etc. and also to carry out case conversion.
The functions available from ctype.h are:-
The use of int instead of char in the return and argument types is historical. For the is..
functions, the return type can be understood to be boolean, In all cases the argument type
can be read as type char.
Help on each on these functions is provided from the RHIDE menu Help.libc reference.
functional categories.ctype.
115
Libraries
The usage of any of these functions can be found by running the info program from the
DOS command line. Move the cursor to
* libc.a: (libc.inf). The Standard C Library Reference
press Enter and choose menu options Functional Categories and math functions.
press Q to exit the info program
116
Libraries
void abort(void);
int abs(int _i);
int atexit(void (*_func)(void));
double atof(const char *_s);
int atoi(const char *_s);
long atol(const char *_s);
void * bsearch(const void *_key, const void *_base, size_t _nelem,
size_t _size, int (*_cmp)(const void *_ck, const void *_ce));
div_t div(int _numer, int _denom);
void exit(int _status) __attribute__((noreturn));
char * getenv(const char *_name);
long labs(long _i);
ldiv_t ldiv(long _numer, long _denom);
void qsort(void *_base, size_t _nelem, size_t _size,
int (*_cmp)(const void *_e1, const void *_e2));
int rand(void);
void srand(unsigned _seed);
double strtod(const char *_s, char **_endptr);
long strtol(const char *_s, char **_endptr, int _base);
unsigned long strtoul(const char *_s, char **_endptr, int _base);
int system(const char *_s);
Some functions in the standard library have been omitted from the above list, because they
are either 'C' functions that have a better counterpart in C++ or because they refer to the
wide char type that is not covered on this course.
Help on these functions can be obtained from within RHIDE by selecting Help.libc
reference.alphabetical list or by entering info at a DOS prompt, moving the cursor to
* libc.a: (libc).
The Standard C Library Reference
and pressing Enter, then Alphabetical list.
117
Bibliography
Bibliography
119