10th ed. chapter 04

(Latest Revision: Wed Jan 20 2021)

[2021/01/20: updated chapter title]
[2019/06/03: added captions to figures]
[2019/05/30: inserted more figures]
[2019/03/21: format changes]
[2019/02/21: current 2019 updates]

Chapter Four -- Threads & Concurrency -- Lecture Notes

• 4.0 Objectives
◦ Identify the basic components of a thread, and contrast threads and processes.
◦ Describe the major benefits and significant challenges of designing multithreaded processes.
◦ Illustrate different approaches to implicit threading, including thread pools, fork-join, and Grand
Central Dispatch.
◦ Describe how the Windows and Linux operating systems represent threads.
◦ Design multithreaded applications using the Pthreads, Java, and Windows threading APIs.

Figure 4.1: Single-threaded and multithreaded processes

• 4.1 Overview See figure 4.1 at right. We can think of a thread as a part of a process that actually
performs an execution of the program - a sequence of instructions read from the program. There is only
one thread in a "traditional process" but there can be multiple threads - multiple execution sequences -
within a single process. The threads can share many parts of the context of the process, like the
program code (aka the text), variables and other data, open files, signals and other messages.

On the other hand, each thread is a separate execution sequence, and so it needs NOT to share certain
parts of its context. Each thread needs to have its own separate program counter, CPU register values,
thread id number, and run-time stack for supporting its own function calls.

◦ 4.1.1 Motivation It's common now that we need computer applications to work for us on
multiple separate concurrent activities, such as a word processor that concurrently renders a
display, checks the spelling in a document, and reads characters the user types with a keyboard.

Figure 4.2: Multithreaded server architecture

When a multi-threaded process runs on a multiprocessor, it's possible for two or more of the
threads to execute simultaneously, which, of course, can be a great efficiency win. One example
would be a busy Internet server process that can serve multiple clients simultaneously.

Most operating system kernels are multithreaded.

◦ 4.1.2 Benefits These are the main categories of benefits of multithreaded programming.

1. Responsiveness: work is divided among threads and some threads can continue working
while others are blocked (for example, waiting for I/O to complete) [ Note this is a type of
concurrent processing that applies to a uniprocessor ]

2. Resource Sharing: Unlike separate processes, threads can share memory and other
resources by default, which makes it easier for them to communicate and cooperate. Also,
there is efficient utilization of primary memory when threads share code and data .

3. Economy: Not much needs to be done to add a new thread to an existing process, or to
perform a context switch between two threads of the same process. Therefore these
operations usually require less time and new memory allocation than creation of a new
process or a context switch between different processes.

4. Scalability: On a multiprocessor, multiple threads can work on a problem in parallel -

truly simultaneously. A single-threaded process can only run on one CPU at a time, no
matter how many CPUs are available in the computer.

• 4.2 Multicore Programming Threads or processes are concurrent when they execute at approximately
the same time. This can happen on a computer with a single CPU (aka a uniprocessor), when, for
example, the operating system performs multitasking. However, parallel threads or processes truly
execute simultaneously. This can only happen on a multiprocessor.

A definition: A multiprocessor with multiple computing cores (CPUs) on a single chip is called a
multicore system.

Figure 4.3: Concurrent execution on a single-core system

Figure 4.4: Parallel execution on a multicore system

◦ 4.2.1 Programming Challenges These are five challenges in programming for multicore

1. Identifying tasks: How do we divide up the work of a process among the threads so that
we get lots of work going on in parallel?

2. Balance: There's a cost of adding a new thread to a process. How do we make sure the
new thread contributes enough to justify that cost?

3. Data Splitting: In what ways do we need to divide up the the data to allocate to separate

4. Data Dependency: When thread X needs to get some data from thread Y before thread X
will be able to continue, how do we synchronize actions of X and Y so the data is passed
from one to the other correctly?

5. Testing and debugging: The actions of two threads executing in parallel can be
interleaved in very numerous ways. Therefore there can be a very large number of
different orders of instruction execution possible for a multithreaded application. How can
we effectively test and debug such applications that have unpredictable execution paths?

Figure 4.5: Data and task parallelism

◦ 4.2.2 Types of Parallelism There are two main kinds of parallelism data parallelism and task

With data parallelism, the data is divided up, and each thread is given one part of the data on
which to work. Each thread does the same operations, but on different data. If we have two lists
to sort and two threads, and if each thread sorts one of the lists, that's data parallelism.

With task parallelism, the threads get different kinds of jobs to do, and each thread uses whatever
part(s) of the data it needs. If one thread computes the average of an array while another thread
finds the maximum value in the array, that's task parallelism.

Programs often use some combination of the two strategies.

Figure 4.6: User and kernel threads

• 4.3 Multithreading Models

◦ So far the threads we have studied are kernel-level threads. There is also something called user-
level threads. Basically, user-level threads are simulations of threads. User threads have some
nice advantages, but also some disadvantages.

Think about a single (kernel-level) thread that executes a time slice in a CPU. Suppose we make
that single thread execute software that simulates multiple threads.

The resulting simulation can switch between multiple different activities many times during the
single time slice of the kernel-level thread in the CPU. In other words, it can very quickly
simulate multiple context switches of multiple (simulated) threads, all during one time slice of
one kernel-level thread. That is a very nice advantage - very low overhead for context switching.

Other advantages are that the simulator can quickly create, execute, and terminate simulated
threads at will without being slowed by requesting help from the kernel and waiting for system
calls to execute.

Such simulated threads, user-level threads, are quite popular for use in many applications.

◦ There has to be a correspondence between user- and kernel-level threads. The mapping can be
many-to-one, one-to-one, or many-to-many.

Figure 4.7: Many-to-one model

◦ 4.3.1 Many-to-One Model:

In this model, a single kernel-level thread supports many user-level threads. As explained above,
context switching is extremely fast among the user-level threads. This model supports
programmers that want to organize software as a group of concurrent threads. However the
kernel-level thread can only execute in one CPU at a time. Therefore the user-level threads can
only execute one at a time. True parallelism is not possible with this model. Also, if any of the
user-level threads makes a blocking system call, the kernel-level thread will have to block, and
so it will not be able to run code for any of the user-level threads until it gets out of its wait
queue. The net effect is that all user threads are blocked if any one of them makes a blocking
system call.

Figure 4.8: One-to-one model

◦ 4.3.2 One-to-One Model

In the one-to-one model, the number of user-level threads is equal to the number of kernel-level
threads. Each user-level thread has exactly one supporting kernel-level thread. Each of the
kernel-level threads supports exactly one user-level thread.

If we use the one-to-one model, we cure problems of the many-to-one model. Now, if one user-
level thread blocks, the others don't have to block.

Also user-level threads can operate in parallel on a multiprocessor.

▪ because each kernel-level thread supports only one user-level thread, we lose the
advantage of quick context switches between user-level threads.
▪ Also each time we create a new user-level thread, we need to create a new kernel-level
thread to support it, so we tend to use up more time creating threads.
▪ There is less flexibility to schedule the user-level threads. The kernel decides when to run
the supporting kernel-threads.
▪ This model may lead to excessive numbers of kernel-level threads, which could adversely
affect performance.

Despite the disadvantages listed above, many operating systems use the one-to-one model. It's
relatively easy to implement, and the disadvantages are viewed as acceptable because many
systems have a large number of processing cores that will support large numbers of kernel-level

Figure 4.9: Many-to-many model

◦ 4.3.3 Many-to-Many Model

In the many-to-many model, a set of user-level threads is supported by a smaller or equal

number of kernel-level threads. A user-level thread is not necessarily bound to any particular
kernel-level thread. The thread library can re-assign it. For example if a kernel-level thread X has
to block, some of the user-level threads that X was supporting can migrate for support to
different kernel-level threads.

The many-to-many model can be seen as a good compromise between the many-to-one and
one-to-one models. The many-to-many model allows the creation of a large multiplicity of user-
level threads that may switch context with great rapidity. The application can have greater
control over the scheduling of the user-level threads. Much of the advantage of the one-to-one
model remains: parallelism and independent blocking.

Figure 4.10: Two-level model

The two-level model is a variation on the many-to-many model, in which a user-level thread
may be bound to a kernel-level thread.

It is difficult to implement the many-to-many model, which is one of the reasons that more
operating systems do not support it.

• 4.4 Thread Libraries

◦ Posix thread (pthread) implementations vary from system to system - could be user-level or
◦ Windows threads are kernel-level
◦ The Java thread API is typically implemented using a native thread package on the host system
(e.g. Pthreads or Windows).
◦ In asynchronous threading, the parent creates one or more child threads and then executes
concurrently with them.
◦ In synchronous threading, the parent creates one or more child threads and waits for all the child
threads to exit before resuming execution.
◦ Section 4.4 contains three examples in which a parent thread creates a child thread to execute a
function. The parent blocks until the child has exited, and then the parent resumes execution.

◦ 4.4.1 Pthreads
▪ If a variable is declared in a program, and it's outside of any function, then it is
automatically in memory shared by all threads of a process. So that makes it very easy to
set up an area of shared memory.
▪ sample thread-creation call:
pthread_create(&tid, &attr, fname, paramPtr)
▪ &tid is the address of a variable to store the id number of the thread.
▪ &attr is the address of a data structure containing attributes for the new thread to have.
▪ fname is the name of the function where the new thread will begin execution.
▪ paramPtr is a pointer to the parameter that will be passed to fname when the new thread
executes fname.

Figure 4.11: Multithreaded C program using the Pthreads API

◦ 4.4.2 Windows Threads

▪ As with POSIX, in the Windows API, if a variable is declared in a program, and it's
outside of any function, then it is automatically in memory shared by all threads of a
process. So that makes it very easy to set up an area of shared memory.
▪ sample thread-creation call:
CreateThread(NULL, 0, Summation, &Param, 0, &ThreadID)
▪ NULL default security attributes
▪ 0 default stack size
▪ Summation name of function thread will execute
▪ &Param parameter for the function
▪ 0 default creation flags
▪ &ThreadID thread identifier

◦ 4.4.3 Java Threads SKIP

▪ Java Executor Framework SKIP

• 4.5 Implicit Threading

The basic idea of implicit threading is automation. Developers and programmers identify tasks that can
run in parallel, and compilers and run-time libraries, or other software, create and manage threads to

perform those tasks.

◦ 4.5.1 Thread Pools

The main thread of a busy Internet server does not have time to 'personally' perform the service
for each client, because it needs to return immediately to the job of accepting the incoming
request from the next client.

One way to deal with that problem is for the main thread to spawn a child, a service thread, for
each client. The service thread then handles the client request and terminates.

However it can take excessive time for the creation and termination of the service threads, and if
too many client requests come in too fast, there may be too many service threads taxing system

The idea of a thread pool is for the server to create a fixed number of service threads at the time
of process start up, and to allow them to 'stay alive' as long as the server operates. The service
threads that don't have anything to do are kept suspended. When a client needs service, and if a
service thread is available, the main thread assigns one to the client. Otherwise the main thread
puts the client in a queue where it waits for a service thread to become available.

When a service thread finishes with a client, it does not terminate. It just 'goes back to the pool'
and waits to be assigned to another client.

Generally it takes less time for a server process to use an existing thread to service a request than
to create a brand new service thread. Also, no matter how much the server is flooded with client
requests, the number of service threads never exceeds the fixed size of the pool.

One opportunity to exploit the idea of implicit threading is to program the "thread pool
architecture" to monitor the frequency of client requests and dynamically adjust the size of the
thread pool to match demand.

▪ Java Thread Pools

◦ 4.5.2 Fork Join

If programmers leave notations in the programs designating work that can be performed in
parallel, then implicit threading library code can create (fork) child threads to do the work, and
arrange for the children to return results to the parent (join the parent) when they terminate.

Figure 4.16: Fork-join parallelism

▪ Fork Join in Java A version of Java has a fork-join library that spawns threads to
perform recursive calls in algorithms like Mergesort.

Figure 4.17: Fork-join in Java

◦ 4.5.3 OpenMP (compiler add-on and API for C, C++, and FORTRAN)

Figure 4.19: UML Class diagram for Java's fork-join

A programmer can insert labels in the code that identify certain sections that should be executed
by parallel threads. The compiler responds to the labels by generating code that creates threads
that execute those sections of code in parallel.

◦ 4.5.4 Grand Central Dispatch is comprised of extensions to C, an API, and a run-time library.
Like OpenMP, it provides parallel processing, although details of the implementation differ.

◦ 4.5.5 Intel Thread Building Blocks is another approach to implicit threading that relies on
templates for parallel structures and task scheduling, rather than special compilers or language

• 4.6 Threading Issues This section summarizes "issues" that have to be resolved by people who
implement operating systems that support multi-threading.

◦ 4.6.1 The fork() and exec() System Calls

When an application is multi-threaded, should the fork() system call duplicate all threads, or just
the calling thread? Some API's provide both options.

Implementations of exec() typically replaces the entire process of the calling thread, including all
threads in that process. Therefore, if the child created by a fork() is going to call exec()
immediately, there's no point in having the fork() duplicate all the threads in the process.

◦ 4.6.2 Signal Handling

Signals are a simple form of interprocess communication in some operating systems, primarily
versions of unix. Signals were designed to behave something like interrupts, but they are not

The OS delivers and handles signals. Delivering signals and handling (responding to) signals are
routine tasks the OS performs as opportunities arise. Sometimes delivery of a signal to a process
(or thread) is required as part of the OS performance of interrupt service, or a system call. The
OS delivers signals to a process (thread) by setting a bit in a context variable of the process
(thread). Just before scheduling a process (thread) to execute, the OS checks to see if any signals
have been delivered to the process (thread) that have not been handled yet. If so, the OS will
cause the signal to be handled properly. Sometimes it does this by executing code in kernel
mode, and sometimes it handles a signal by jumping into the user process at the start address of a
special handler routine the process has for responding to the signal. The exact appropriate way of
handling a signal depends on the nature of the signal.

Multithreading complicates the problem of implementing signal delivery.

Should a signal be delivered to all the threads in a process or just some? There are many different
kinds of signals, and the answer to that question differs, depending on the nature of the signal.

Often the handler for a signal should run only once. A signal sent to a process may be delivered
only to the first thread that is not blocking it.

The OS may provide a function to send a signal to one particular thread.

◦ 4.6.3 Thread Cancellation

Sometimes a thread starts work but it should be cancelled before it finishes - for example if two
threads are searching a database for a record and one of them finds it, the other thread should be

Thread cancellation can be implemented in a manner similar to how signals work. In fact it may
be implemented using signals.

Since problems could be caused by instantly cancelling a thread in a task that is in the midst of
doing some work, the implementation of cancellation typically includes ways for threads to defer
their cancellation so that they have time to 'clean up' first - for example to deallocate resources
they are holding, or to finish updating shared data.

◦ 4.6.4 Thread Local Storage

Typically threads have some need for thread-specific data (thread local storage). This is data not

shared with other threads. In Pthreads processing, local variables can play this role, but local
variables exist only within one function, so provision for thread local storage that is 'more global'
may be needed. Most thread APIs provide support for such thread local storage.

◦ 4.6.5 Scheduler Activations

This section describes some rather arcane details of how the relationship between user-level
threads and kernel-level threads may be implemented.

The "Scheduler Activation" scheme for simulating threads at the user level relies to a great
extent on the kernel communicating about certain events to the application that is executing the
user-level threads. These communications come from the kernel and go "up" to the application,
and they are named upcalls. Read the information, but I won't ask you for any details.

Figure 4.20: Lightweight process (LWP)

• 4.7 Operating-System Examples

◦ 4.7.1 Windows Threads

Figure 4.21: Data structures of a Windows thread

In Windows, applications run as separate processes, which may have multiple threads. Per-thread
resources include, ID number, register set, user stack and kernel stack (for the use of the OS
when executing in behalf of the process, for example when executing a system call for the
process), and private storage used by library code. There are three primary data structures for
holding the context of threads: ETHREAD (executive thread block), KTHREAD (kernel thread
block), and TEB (thread environment block). The first two reside in kernel memory, and the TEB
is in user-space.

◦ 4.7.2 Linux Threads

Linux has a traditional fork() system call that creates an exact duplicate of the parent.

Linux also has a clone() system call with flag parameters that determine what resources will be
shared between parent and child (clone). If a large amount of context is shared between parent
and clone, then the clone is about the same thing as what we have before called a new thread
inside the parent process. On the other hand, if nothing is shared, then the clone is about the

same as the child of a traditional fork() operation.

