Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

How to Get More Bang From Your Java

or

Gee, I Wish That For Loop Was Faster


By Luke Vorster (vorsterl@ukzn.ac.za)

Introduction
This study introduces the topic of using threads in the Java programming language and on the JVM
in general – it is of interest to Java programmers who want to learn how to develop software with
one or more of the following goals:
1. Programs that run much faster than serial counterparts. This can be achieved in two ways:
a) By interleaving CPU-bound and IO-bound subroutines to achieve both maximum
processor utilisation (speedup), and bandwidth utilisation (throughput).
b) By decomposing subroutines into atomic 'tasks' that run across a number of parallel
processing cores to achieve near-linear speedup (in some cases).
2. Programs that scale well – e.g. Google can run on your desktop and index your local system,
or it can run as a cloud-based application and index the public Internet. Adding more
processors and other resources only help if a program is capable of using them.
3. Programs that have to synchronise cooperating tasks that are dependant on each others'
respective states. For example, a cruise-control system for a motor-vehicle requires a
number of realtime sensors such as speedometers, brakes, accelerators, torque, etc. to be
monitored and manipulated in parallel. It is crucial that only valid system events take place!
Threading is an integral part of using the Java language and platform – so much so, that much
threading support is built into the language itself. In addition, other features of a threaded system
are required, by the JVM, to be provided by the underlying operating system.
What are threads? Threads are a feature of processor architecture (i.e. a hardware facility), which
are exposed to applications via system libraries by the operating system vendor, or a third-party.
The word 'thread' is a shorthand term for the phrase thread of control, which is, at its simplest, a
section of code executed independently of other threads of control within a single process.
Threads exist within the scope of a single process, and so context-switching is much lighter than
process context-switching. From this perspective, one can think of threads as 'lightweight
processes'. Multiple threads running within a single process is not radically different to multiple
processes running within an operating system kernel. Sadly, however, the topic of threads is usually
considered a peripheral programming topic, one that's only needed in 'exotic' programming cases.
More and more each day, this notion proves to be a falsity.
Thread support can be accessed via most modern programming languages, and the resulting
functionality is identical regardless of language. However, a large problem arises when one
considers standardisation – very few languages provide multi-threaded support natively, so most
languages that do provide the support can only do so via additional non-standard libraries (APIs).
These APIs differ radically in terms of programming paradigm, robustness, and architecture. For
example, on UNIX, thread support can be provided by system calls written in C, which is very
lowlevel and difficult to read. Furthermore, due to the advent of threads occurring long after UNIX
was developed, there is no standard API, and so there is not much scope for portability. Object-
oriented languages make it easier to understand multi-threaded applications, but the lack of
standardisation is always a show-stopper. Many multi-threaded applications have more than one
implementation simply in order to allow them to run on different systems – needless to say, this is
a development nightmare. (How do you track a bug on all target platforms if there is a different
code-base for each platform?)
Java thread support is native to the language in a standardised manner, and it is therefore one of
the the best languages to learn about threaded programming.
Things are different with Java: once a programmer learns how to use threads, she will never look
back because using threads is the only way to unleash the processing capabilities of modern
computer systems. Threads do, however, require a different way of thinking about programs, which
is why even experienced programmers find the first knowledge curve to be so steep it appears to
be impassable. Hopefully these study notes will help the reader demystify the dark art of concurrent
programming.
In reality, it is not possible to write a Java program without being exposed to the concept of threads
in some way or another:
● The main method of a Java program entry-point is, in fact, one of a number of concurrent
threads running on the JVM.
● Graphical user interfaces rely on threads to render widgets, catch user interface events
(mouse, keys, etc.), and dispatch subroutines concurrently.
Can you imagine a system that stops everything the moment you move the mouse or type
something? Or a movie player application that renders the video from begining to end, and
_then_ renders the audio? What about a system that freezes everytime the desktop clock
has to reflect that yet another second has passed? (!)

Multi-threaded Programming in Java


A Java program can contain many threads, all of which can be created without the explicit
knowledge of the developer. The main concepts a developer needs to bear in mind when using the
Java language are minimal:
● There is an initial thread that begins its operation by executing the main method of your
application. i.e. The program starts with what one can consider as a single thread (though
the JVM is also running others such as GUI event queues, and garbage collection, etc.).
● If you want to perform I/O (particularly if the I/O might block), start a timer, or do any other
task in parallel with the initial thread, you should start a new thread to perform that task.
This situation is not unlike multi-processing UNIX applications, and the related system calls such as
fork, exec, semget, sematt, semctl, shmget, shmat, shmctl, etc. The only difference is that threads
are not standardised for most languages. Within a single Java program (the JVM is a process),
multiple threads have the following properties:
● Each thread begins execution at a predefined, well-known location. e.g. main(), init(), run(),
etc.
● Each thread executes code from its starting location in an ordered, predefined (for a set of
inputs) sequence. Threads are single-minded in their purpose, always simply executing the
next statement in the sequence.
● Each thread executes its code independently of the other threads in the program. If the
threads choose to cooperate with each other, there are explicit mechanisms available.
● Each thread appears to have a certain degree of simultaneous execution. This depends on
several factors, the most important of which is the number of threads versus the number of
processing cores available.
● Each thread has access to certain types of data. This is where multi-threading is different to
multi-processing, where processes have to use shared memory to inter-communicate. Local
variables in the methods that the thread is executing are completely separate from other
threads, and, therefore private. All static variables are accessible to all threads. Any object
reference, i.e. data on the JVM heap, can be shared between threads via object monitors
that lock access to an object for a given thread, and are then released after the thread
releases it.
Objects that need to protect their state in a mutually exclusive way, among multiple threads that
need to interact with those objects, use the 'synchronized' keyword. This keyword can also be used
at the method-level, or as a static code-block. Synchronised code is guaranteed to be run by one
thread at a time. Waiting threads are suspended, and woken up individually as per the scheduling
algorithm, or in a group to contend for the CPU, if the object becomes available again. This is called
a an object monitor, or lock, and is based on the use of semaphores for mutual exclusion, and then
applied to object-oriented technology.
A simple thread class:
public class HelloThread extends Thread {

public HelloThread(int index){


this.index = index;
}
@Override
public void run() {
System.out.println(
“Hello Threaded World! I am thread number: ” + index
);
}
private int index;
}
To create a number of these threads to run concurrently on the same JVM, let's say four threads
because the machine has four cores:
public class RunThreads {
public static void main(String[] args) {
for (int i = 0; i < 4; i++) {
Thread t = new HelloThread(i);
t.start();
}
System.out.println(“Done.”);
}
}
This program will output something similar to the following on any number of CPUs:
Hello Threaded World! I am thread number: 0
Hello Threaded World! I am thread number: 1
Hello Threaded World! I am thread number: 3
Done.
Hello Threaded World! I am thread number: 2

The JVM scheduler submits the thread when your application calls start(), and when the thread gets
executed, the run() method is invoked. The order that the statements actually get to be printed by
the standard output is not a reflection of the order that they actually ran in, it is a reflection of the
memory buffering used by the standard output of the operating system, and may vary from run to
run. E.g. Even the “Done.” statement might not be printed out at the end!
To synchronise a group of threads, so that, for example you can perform a task that is dependent
on the completion of all threads so that you may resume control once all threads are complete (e.g.
guarantee that “Done.” Is printed last….), a synchronisation mechanism is needed, which can be
added in the way of 'joining' the threads to the main thread:
public class RunThreadsWithJoin {
public static void main(String[] args) {
Thread[] tasks = new Thread[4];
for (int i = 0; i < tasks.length; i++) {
tasks[i] = new HelloThread(i);
tasks[i].start();
}
for (int i = 0; i < tasks.length; i++) {
tasks[i].join();
} // presuming all threads are still running
System.out.println(“Done.”);
}
}
Here, we create an array of threads, and, once they have all been started, we loop through all of
them and 'join' their execution to a single point in the parent thread (caller of join() method).
Provided that all the threads are still running by the time the join method is called, the code will
work (otherwise an InterruptedException will be thrown...). The thread state-checking code has
been omitted for simplification purposes, but it should be noted that the program can interrogate
the runtime system to access the state of thread objects, and it can also respond to threading
related exceptions or errors.
This program would always output the “Done.” Statement at the end, and will look something like:
Hello Threaded World! I am thread number: 0
Hello Threaded World! I am thread number: 1
Hello Threaded World! I am thread number: 3
Hello Threaded World! I am thread number: 2
Done.

The reader is referred to the Internet for further information on Java Threads.

Programming Exercise
As a programming exercise, the reader is encouraged to develop the following serial program, and
then to add multi-threading support for comparison.
Serial Algorithm
1. Create a large array of random numbers within a broad range (e.g. -1000000 to +1000000).
(the larger, the better – try to occupy > 100Mb of memory if you can!)
2. After populating the array, replace one of the elements (randomly chosen index), with ONE
instance of a 'magic' number that is outside of the range. (e.g. +9999999)
3. Implement a simple function that traverses this array, one element at a time, with a for loop
in order to 'find' the index of the magic number (array and magic number as parameters).
4. When the number is found, print out the index of the number's location in the array.

Multi-threaded Algorithm
1. Convert the searching function into a thread. Construct the thread with the array and the
magic number.
2. Instantiate a number of threads, each with an ID (e.g. 0, 1, 2, etc.), a reference to the array,
and the magic number.
3. Using the ID, modify the run method of the thread to search a section of the array. e.g. if you
instantiate 4 threads, then each run method should only traverse a quarter of the array –
thread 0 the first, thread 1 the second, and so on. (hint: use thread ID and thread count to
determine the for loop bounds for each thread)
4. Terminate the application with a printout of the global index of the magic number as soon
as one of the threads finds the magic number.
How many threads does it take until the program is saturated? i.e. If you keep increasing the
number of threads, does the program keep speeding up proportionately? (If two threads is slower
than one, than your array is too small...) Have fun!
FIN

You might also like