Mohamed Abdelrahman Anwar - 20011634 - Sheet 4

Name ID
Mohamed Abdelrahman Anwar 20011634
Operating Systems
Sheet 4
Threads
4.1 Processes and Threads
1.
a. Creation & Deletion Overhead:
• A process is an instance of a program that is being
executed, with its own memory space, file descriptors, and
system resources. Creating a new process involves a
significant amount of overhead, such as allocating memory
and other resources, loading the program into memory,
and setting up communication channels with other
processes. Similarly, deleting a process involves freeing up
all the resources used by the process, including memory,
files, and other system resources. Thus, the creation and
deletion of a process is relatively expensive and time-
consuming.
• A thread, on the other hand, is a lightweight execution
unit that shares the same memory space as its parent
process. Creating a new thread involves much less
overhead than creating a new process, as most of the
resources are already allocated for the process. Similarly,
deleting a thread is less expensive than deleting a process,
as most of the resources are still being used by the parent
process. Thus, the creation and deletion of a thread is
relatively fast and inexpensive compared to a process.
Way of Communication and its speed:
• Processes are isolated from each other and cannot directly
access each other's memory space. To communicate
between processes, inter-process communication (IPC)
mechanisms such as pipes, sockets, or message queues
must be used. IPC has higher overhead than thread
communication, as it involves copying data between
different memory spaces and synchronization mechanisms
to avoid race conditions. Thus, the speed of
communication between processes is slower compared to
threads.
• Threads, on the other hand, can communicate with each
other directly through shared memory. This is faster than
IPC because threads can access the same memory location
without copying data or using synchronization
mechanisms. However, care must be taken to avoid race
conditions, where two or more threads access the same
memory location simultaneously, leading to unexpected
results.
4.2 TYPES OF THREADS
2.
a. Mapping:
• User-level threads are created and managed entirely by
the application program without any support from the
operating system. The mapping between user-level
threads and kernel-level threads is one-to-one, which
means that each user-level thread corresponds to one
kernel-level thread. This mapping is maintained by the
application program.
• Kernel-level threads, on the other hand, are created and
managed by the operating system. The mapping between
kernel-level threads and user-level threads is many-to-one
or many-to-many, which means that multiple user-level
threads can be mapped to the same kernel-level thread, or
multiple kernel-level threads can be used to support a
single user-level thread.
Dealing with multi-processor systems:
• User-level threads are not aware of the underlying
hardware and do not take advantage of multiple
processors or cores. As a result, user-level threads cannot
fully utilize the processing power of a multi-processor
system.
• Kernel-level threads, on the other hand, are aware of the
underlying hardware and can be scheduled to run on
different processors or cores, allowing them to fully utilize
the processing power of a multi-processor system.
Overhead on the kernel:
• User-level threads have very low overhead on the kernel,
as all thread management is done by the application
program. The kernel is only involved when a thread needs
to block or unblock, which is a relatively rare occurrence.
• Kernel-level threads have higher overhead on the kernel,
as all thread management is done by the operating
system. The kernel is involved in every thread creation,
deletion, and context switch.
Portability:
• User-level threads are highly portable across different
operating systems, as they do not rely on any specific
features of the operating system. However, the
performance of user-level threads may be affected by
differences in hardware or system configuration.
• Kernel-level threads are less portable across different
operating systems, as they rely on specific features of the
operating system. However, the performance of kernel-
level threads is more consistent across different hardware
and system configurations.
Who is doing dispatching and scheduling?
• User-level threads are dispatched and scheduled by the
application program. The program must implement its
own scheduling algorithm and determine which thread to
run next.
• Kernel-level threads are dispatched and scheduled by the
operating system. The operating system uses a kernel-
level scheduler to determine which thread to run next,
based on factors such as thread priority, time slice, and
processor affinity.
3. When a user-level thread makes a system call, it enters the
kernel mode to execute the system call. At this point, the entire
process is blocked, including all user-level threads within the
process. This is because user-level threads within a process
share the same process context, including the same address
space, file descriptors, and other resources. When one thread
blocks, it blocks the entire process.
This does not happen in kernel-level threads, as each kernel-
level thread has its own kernel-level context and can execute
system calls independently of other threads in the process.
When a kernel-level thread makes a system call, only that
thread is blocked, and other kernel-level threads in the same
process can continue to execute.
4.
ULTs:
Advantages:
• Thread switching does not require kernel-mode privileges
because all of the thread management data structures are
within the user address space of a single process.
Therefore, the process does not switch to the kernel mode
to do thread management. This saves the overhead of two
mode switches (user to kernel; kernel back to user).
• Scheduling can be application specific. One application
may benefit most from a simple round-robin scheduling
algorithm, while another might benefit from a priority-
based scheduling algorithm. The scheduling algorithm can
be tailored to the application without disturbing the
underlying OS scheduler.
• ULTs can run on any OS. No changes are required to the
underlying kernel to support ULTs. The threads library is a
set of application-level functions shared by all applications.
Disadvantages:
• In a typical OS, many system calls are blocking. As a result,
when a ULT executes a system call, not only is that thread
blocked, but all of the threads within the process are
blocked as well.
• In a pure ULT strategy, a multithreaded application cannot
take advantage of multiprocessing. A kernel assigns one
process to only one processor at a time. Therefore, only a
single thread within a process can be executed at a time.
In effect, we have application-level multiprogramming
within a single process. While this multiprogramming can
result in a significant speedup of the application, there are
applications that would benefit from the ability to execute
portions of code simultaneously.
KLTs:
Advantages:
This approach overcomes the two principal drawbacks of the
ULT approach:
• First, the kernel can simultaneously schedule multiple
threads from the same process on multiple processors.
• Second, if one thread in a process is blocked, the kernel
can schedule another thread of the same process.
• Another advantage of the KLT approach is that kernel
routines themselves can be multithreaded.
Disadvantages:
The principal disadvantage of the KLT approach compared to
the ULT approach is that the transfer of control from one
thread to another within the same process requires a mode
switch to the kernel.
4.3 MULTICORE AND MULTITHREADING

5. Thread switches are generally faster than process switches.
When a process switch occurs, the operating system needs to
save the state of the currently executing process, including the
values of the CPU registers, the program counter, and the stack
pointer. Then, it loads the state of the new process into
memory, including its code, data, and stack. This process can
take a relatively long time, especially if the processes are large
and have a lot of data to be swapped in and out of memory.
In contrast, when a thread switch occurs, the operating system
only needs to save and restore the state of the CPU registers
and the program counter, as threads within the same process
share the same memory space and code. The data and stack do
not need to be swapped in and out of memory, as they are
already resident in memory.
Furthermore, thread switches can be implemented using user-
level thread libraries, which do not require system calls to
switch threads. User-level thread switches can be much faster
than kernel-level process switches, as they do not involve the
overhead of entering and exiting kernel mode.
6. An algorithm that performs several independent calculations

concurrently, such as matrix multiplication, can potentially be
more efficient if it uses threads, depending on the specific
hardware and software environment in which it is executed.
In a uniprocessor system, using threads may not provide
significant performance benefits, as only one thread can
execute at a time, and the overhead of thread creation and
management may outweigh any gains in concurrency.
However, if the algorithm involves I/O operations, such as
reading input data from a file, using threads can improve
performance by allowing other threads to execute while the I/O
operation is waiting to complete.
In a multiprocessor system, using threads can provide
significant performance benefits by utilizing the available
processing power. If the algorithm is implemented using user-
level threads, it may not be able to fully utilize the processing
power of the system, as user-level threads are not aware of the
underlying hardware and cannot be scheduled to run on
different processors or cores. On the other hand, if the
algorithm is implemented using kernel-level threads, it can fully
utilize the processing power of the system by allowing threads
to be scheduled on different processors or cores.
In terms of user/kernel threads, the performance of the
algorithm can also depend on the overhead of thread creation
and management. User-level threads have lower overhead than
kernel-level threads, as thread management is done by the
application program rather than the operating system.
However, user-level threads may not be able to fully utilize the
processing power of the system, as discussed above. Kernel-
level threads, on the other hand, can fully utilize the processing
power of the system but have higher overhead than user-level
threads.
4.4 WINDOWS PROCESS AND THREAD

MANAGEMENT
7.
1. Ready: A ready thread may be scheduled for execution. The
Kernel dispatcher keeps track of all ready threads and
schedules them in priority order.
2. Standby: A standby thread has been selected to run next to a
particular processor. The thread waits in this state until that
processor is made available. If the standby thread’s priority is
high enough, the running thread on that processor may be
preempted in favor of the standby thread. Otherwise, the
standby thread waits until the running thread blocks or
exhausts its time slice.
3. Running: Once the Kernel dispatcher performs a thread
switch, the standby thread enters the Running state and begins
execution and continues execution until it is preempted by a
higher-priority thread, exhausts its time slice, blocks, or
terminates. In the first two cases, it goes back to the Ready
state.
4. Waiting: A thread enters the Waiting state when (1) it is
blocked on an event (e.g., I/O), (2) it voluntarily waits for
synchronization purposes, or (3) an environment subsystem
directs the thread to suspend itself. When the waiting condition
is satisfied, the thread moves to the Ready state if all of its
resources are available.
5. Transition: A thread enters this state after waiting if it is
ready to run, but the resources are not available. For example,
the thread’s stack may be paged out of memory. When the
resources are available, the thread goes to the Ready state.
6. Terminated: A thread can be terminated by itself, by another
thread, or when its parent process terminates. Once
housekeeping chores are completed, the thread is removed
from the system, or it may be retained by the Executive6 for
future reinitialization.
General Questions
8.
a. In this scenario, each job requires four tape drives to
complete its execution, and the scheduler in the OS will not
start a job unless there are four tape drives available. So, the
maximum number of jobs that can be done at once is limited by
the number of tapes drives available, which is 20. Therefore,
the maximum number of jobs in progress at once is 20/4 = 5.
When a job is started, four tape drives are assigned
immediately and are not released until the job finishes. This
means that the minimum number of tape drives that may be
left idle because of this policy is always 0. However, the
maximum number of idle tape drives can vary depending on
the number of jobs in progress and their tape drive
requirements. If all 5 jobs require 4 tape drives, then all 20 tape
drives will be in use, and none will be left idle. However, if
some of the jobs require only 3 tape drives, then some tape
drives may be left idle until enough jobs with 4 tape drive
requirements are available to use them.
b. An alternative policy that can improve tape drive utilization
and avoid system deadlock is to allow jobs to start with 3 tape
drives and allocate the fourth tape drive dynamically when it
becomes available. This way, more jobs can start and progress
with only 3 tape drives, and the fourth tape drive can be
assigned to them when it becomes available. This policy
ensures that no job is waiting indefinitely for a tape drive to
become available, and avoids the possibility of all tape drives
being occupied by jobs that require 4 drives.
The maximum number of jobs that can be in progress at once
under this policy is limited by the number of tapes drives
available, which is still 20. However, since jobs can start with
only 3 tape drives, more jobs can be in progress at once,
depending on the mix of job requirements. For example, if all
jobs require only 3 tape drives, then up to 20/3 = 6 jobs can be
in progress at once. If some jobs require 4 tape drives, then the
maximum number of jobs in progress at once will be limited by
the availability of the fourth tape drive.
9.
a. The number of kernel threads allocated to the program is
less than the number of processors: In this scenario, the system
will switch between user-level threads and kernel-level threads
to allocate processors. This may result in poor performance due
to frequent context switches between threads. Furthermore,
the system will not be fully utilizing all processors, as some
processors may remain idle while other threads are being
executed.
b. The number of kernel threads allocated to the program is
equal to the number of processors: In this scenario, each
processor will be assigned a kernel-level thread, which will
manage the execution of one or more user-level threads. This
can result in good performance, as each processor will be fully
utilized, and context switching between threads will be
minimized.
c. The number of kernel threads allocated to the program is
greater than the number of processors but less than the
number of user-level threads: In this scenario, some kernel-
level threads will manage the execution of multiple user-level
threads, which may result in some processors being idle while
other threads are being executed. This can lead to poor
performance, as some processors may remain underutilized
while other processors are overutilized.
10.
a. the program's main goal is to demonstrate how to use
pthreads to create a simple multithreaded program in C, and
how to use a shared global variable between two threads.
b.
• The output seems correct, except for the fact that there is
a single extra '.' in the first line of the output. This could be
because the printf() function is not flushing the output
buffer immediately.
• As for the final output, it shows that the value of myglobal
is 21. This is the expected value because both the main
thread and the child thread increment myglobal 20 times
each, for a total of 40 increments. Therefore, the final
value of myglobal should be 20 + 20 + 1 (initial value) = 41.
However, the output shows that the final value of
myglobal is 21, which is not the expected value. This could
be due to a race condition that occurs when both threads
try to access and modify the myglobal variable
concurrently. Since there is no synchronization mechanism
such as a mutex or a semaphore used in the program, the
order in which the threads access and modify myglobal is
non-deterministic. Therefore, the final value of myglobal
could be unpredictable, and different runs of the program
could produce different results.

Mohamed Abdelrahman Anwar - 20011634 - Sheet 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mohamed Abdelrahman Anwar - 20011634 - Sheet 4

Uploaded by

Copyright:

Available Formats

Name ID

Mohamed Abdelrahman Anwar 20011634

4.3 MULTICORE AND MULTITHREADING

6. An algorithm that performs several independent calculations

4.4 WINDOWS PROCESS AND THREAD

You might also like