Multiprocessors and Threads: Fred Kuhns CS523S: Operating Systems

Multiprocessors and Threads
Lecture 3
Fred Kuhns ( )
CS523S: Operating Systems
Motivation for Multiprocessors

Enhanced Performance Concurrent execution of tasks for increased
throughput (between processes)
Exploit Concurrency in Tasks (Parallelism
within process)
Fault Tolerance graceful degradation in face of failures
Fred Kuhns ( )
CS523S: Operating
Basic MP Architectures
Single Instruction Single Data (SISD) conventional uniprocessor designs.
Single Instruction Multiple Data (SIMD) Vector and Array Processors
Multiple Instruction Single Data (MISD) Not Implemented.
Multiple Instruction Multiple Data (MIMD)
- conventional MP designs
Fred Kuhns ( )
CS523S: Operating
MIMD Classifications
Tightly Coupled System - all processors
share the same global memory and have the
same address spaces (Typical SMP system).
Main memory for IPC and Synchronization.
Loosely Coupled System - memory is

partitioned and attached to each processor.
Hypercube, Clusters (Multi-Computer).
Message passing for IPC and synchronization.
Fred Kuhns ( )
CS523S: Operating
MP Block Diagram
CPU
CPU
CPU
CPU
cache MMU
cache MMU
cache MMU
cache MMU
Interconnection Network
MM
Fred Kuhns ( )
MM
MM
CS523S: Operating
MM
Memory Access Schemes

Uniform Memory Access (UMA)
Centrally located
All processors are equidistant (access times)
NonUniform Access (NUMA)

physically partitioned but accessible by all
processors have the same address space
NO Remote Memory Access (NORMA)

physically partitioned, not accessible by all
processors have own address space
Fred Kuhns ( )
CS523S: Operating
Other Details of MP
Interconnection technology
Bus
Cross-Bar switch
Multistage Interconnect Network
Caching - Cache Coherence Problem!

Write-update
Write-invalidate
bus snooping
Fred Kuhns ( )
CS523S: Operating
MP OS Structure - 1
Separate Supervisor all processors have their own copy of the kernel.
Some share data for interaction
dedicated I/O devices and file systems
good fault tolerance
bad for concurrency
Fred Kuhns ( )
CS523S: Operating
MP OS Structure - 2
Master/Slave Configuration
master monitors the status and assigns work to
other processors (slaves)
Slaves are a schedulable pool of resources for
the master
master can be bottleneck
poor fault tolerance
Fred Kuhns ( )
CS523S: Operating
MP OS Structure - 3
Symmetric Configuration - Most Flexible.
all processors are autonomous, treated equal
one copy of the kernel executed concurrently
across all processors
Synchronize access to shared data structures:
Lock entire OS - Floating Master
Mitigated by dividing OS into segments that normally
have little interaction
multithread kernel and control access to resources
(continuum)
Fred Kuhns ( )
CS523S: Operating
MP Overview
MultiProcessor
SIMD
MIMD
Shared Memory
(tightly coupled)
Master/Slave
Fred Kuhns ( )
Distributed Memory
(loosely coupled)
Symmetric
(SMP)
CS523S: Operating
Clusters
SMP OS Design Issues

Threads - effectiveness of parallelism depends
on performance of primitives used to express
and control concurrency.
Process Synchronization - disabling interrupts
is not sufficient.
Process Scheduling - efficient, policy
controlled, task scheduling (process/threads)
global versus per CPU scheduling
Task affinity for a particular CPU
resource accounting and intra-task thread
dependencies
Fred Kuhns ( )
CS523S: Operating
SMP OS design issues - 2

Memory Management - complicated since
main memory is shared by possibly many
processors. Each processor must maintain its
own map tables for each process
cache coherence
memory access synchronization
balancing overhead with increased concurrency
Reliability and fault Tolerance - degrade

gracefully in the event of failures
Fred Kuhns ( )
CS523S: Operating
Typical SMP System

500MHz
CPU
cache
CPU
MMU
Issues:
Memory contention
Limited bus BW
I/O contention
Cache coherence
cache
MMU
CPU
cache
MMU
cache
MMU
System/Memory Bus
INT
Main 50ns
Memory
I/O
Bridge
subsystem
System Functions
(timer, BIOS, reset)
Typical I/O Bus:
33MHz/32bit (132MB/s)
66MHz/64bit (528MB/s)
Fred Kuhns ( )
CPU
CS523S: Operating
ether
scsi
video
Some Definitions
Parallelism: degree to which a multiprocessor
application achieves parallel execution
Concurrency: Maximum parallelism an
application can achieve with unlimited
processors
System Concurrency: kernel recognizes
multiple threads of control in a program
User Concurrency: User space threads
(coroutines) provide a natural programming
model for concurrent applications. Concurrency
not supported by system.
Fred Kuhns ( )
CS523S: Operating
Process and Threads

Process: encompasses
set of threads (computational entities)
collection of resources
Thread: Dynamic object representing an

execution path and computational state.
threads have their own computational state: PC,
stack, user registers and private data
Remaining resources are shared amongst threads
in a process
Fred Kuhns ( )
CS523S: Operating
Threads
Effectiveness of parallel computing depends on
the performance of the primitives used to
express and control parallelism
Threads separate the notion of execution from
the Process abstraction
Useful for expressing the intrinsic concurrency
of a program regardless of resulting
performance
Three types: User threads, kernel threads and
Light Weight Processes (LWP)
Fred Kuhns ( )
CS523S: Operating
User Level Threads

User level threads - supported by user level
(thread) library
Benefits:
no modifications required to kernel
flexible and low cost
Drawbacks:
can not block without blocking entire process
no parallelism (not recognized by kernel)
Fred Kuhns ( )
CS523S: Operating
Kernel Level Threads

Kernel level threads - kernel directly supports
multiple threads of control in a process. Thread
is the basic scheduling entity
Benefits:
coordination between scheduling and
synchronization
less overhead than a process
suitable for parallel application
Drawbacks:
more expensive than user-level threads
generality leads to greater overhead
Fred Kuhns ( )
CS523S: Operating
Light Weight Processes (LWP)

Kernel supported user thread
Each LWP is bound to one kernel thread.
a kernel thread may not be bound to an LWP
LWP is scheduled by kernel

User threads scheduled by library onto LWPs
Multiple LWPs per process
Fred Kuhns ( )
CS523S: Operating
First Class threads (Psyche OS)

Thread operations in user space:
create, destroy, synch, context switch
kernel threads implement a virtual processor

Course grain in kernel - preemptive scheduling
Communication between kernel and threads library
shared data structures.
Software interrupts (user upcalls or signals). Example, for
scheduling decisions and preemption warnings.
Kernel scheduler interface - allows dissimilar thread
packages to coordinate.
Fred Kuhns ( )
CS523S: Operating
Scheduler Activations
An activation:
serves as execution context for running thread
notifies thread of kernel events (upcall)
space for kernel to save processor context of current
user thread when stopped by kernel
kernel is responsible for processor allocation =>

preemption by kernel.
Thread package responsible for scheduling
threads on available processors (activations)
Fred Kuhns ( )
CS523S: Operating
Support for Threading

BSD:
process model only. 4.4 BSD enhancements.
Solaris:provides
user threads, kernel threads and LWPs
Mach: supports
kernel threads and tasks. Thread libraries provide
semantics of user threads, LWPs and kernel threads.
Digital UNIX: extends MACH to provide usual

UNIX semantics.
Pthreads library.
Fred Kuhns ( )
CS523S: Operating
Solaris Threads
Supports:
user threads (uthreads) via libthread and libpthread
LWPs, acts as a virtual CPU for user threads
kernel threads (kthread), every LWP is associated
with one kthread, however a kthread may not have an
LWP
interrupts as threads
Fred Kuhns ( )
CS523S: Operating
Solaris kthreads
Fundamental scheduling/dispatching object
all kthreads share same virtual address space
(the kernels) - cheap context switch
System threads - example STREAMS, callout
kthread_t, /usr/include/sys/thread.h
scheduling info, pointers for scheduler or sleep
queues, pointer to klwp_t and proc_t
Fred Kuhns ( )
CS523S: Operating
Solaris LWP
Bound to a kthread
LWP specific fields from proc are kept in
klwp_t (/usr/include/sys/klwp.h)
user-level registers, system call params, resource
usage, pointer to kthread_t and proc_t
klwp_t can be swapped with LWP

LWP non-swappable info kept in kthread_t
Fred Kuhns ( )
CS523S: Operating
Solaris LWP (cont)

All LWPs in a process share:
signal handlers
Each may have its own

signal mask
alternate stack for signal handling
No global name space for LWPs
Fred Kuhns ( )
CS523S: Operating
Solaris User Threads

Implemented in user libraries
library provides synchronization and scheduling
facilities
threads may be bound to LWPs
unbound threads compete for available LWPs
Manage thread specific info
thread id, saved register state, user stack, signal mask,
priority*, thread local storage
Solaris provides two libraries: libthread and

libpthread.
Try man thread or man pthreads
Fred Kuhns ( )
CS523S: Operating
Solaris Thread Data Structures

proc_t
p_tlist
klwp_t
lwp_thread
lwp_procp
Fred Kuhns ( )
CS523S: Operating
kthread_t
t_procp
t_lwp
t_forw
Solaris: Processes, Threads and LWPs

Process 2
Process 1
user
......
...
kernel
hardware
Fred Kuhns ( )
Int kthr
CS523S: Operating
Solaris Interrupts
One system wide clock kthread
pool of 9 partially initialized kthreads per CPU
for interrupts
interrupt thread can block
interrupted thread is pinned to the CPU
Fred Kuhns ( )
CS523S: Operating
Solaris Signals and Fork

Divided into Traps (synchronous) and interrupts
(asynchronous)
each thread has its own signal mask, global set
of signal handlers
Each LWP can specify alternate stack
fork replicates all LWPs
fork1 only the invoking LWP/thread
Fred Kuhns ( )
CS523S: Operating
Mach
Two abstractions:
Task - static object, address space and system
resources called port rights.
Thread - fundamental execution unit and runs in
context of a task.
Zero or more threads per task,
kernel schedulable
kernel stack
computational state
Processor sets - available processors divided into nonintersecting sets.

permits dedicating processor sets to one or more tasks
Fred Kuhns ( )
CS523S: Operating
Mach c-thread Implementations

Coroutine-based - multiples user threads onto
a single-threaded task
Thread-based - one-to-one mapping from cthreads to Mach threads. Default.
Task-based - One Mach Task per c-thread.
Fred Kuhns ( )
CS523S: Operating
Digital UNIX
Based on Mach 2.5 kernel
Provides complete UNIX programmers interface
4.3BSD code and ULTRIX code ported to Mach
u-area replaced by utask and uthread
proc structure retained
Fred Kuhns ( )
CS523S: Operating
Digital UNIX threads

Signals divided into synchronous and
asynchronous
global signal mask
each thread can define its own handlers for
synchronous signals
global handlers for asynchronous signals
Fred Kuhns ( )
CS523S: Operating
Pthreads library
One Mach thread per pthread
implements asynchronous I/O
separate thread created for synchronous I/O which in
turn signals original thread
library includes signal handling, scheduling

functions, and synchronization primitives.
Fred Kuhns ( )
CS523S: Operating
Mach Continuations
Address problem of excessive kernel stack memory
requirements
process model versus interrupt model
one per process kernel stack versus a per thread kernel
stack
Thread is first responsible for saving any required

state (the thread structure allows up to 28 bytes)
indicate a function to be invoked when unblocked
(the continuation function)
Advantage: stack can be transferred between
threads eliminating copy overhead.
Fred Kuhns ( )
CS523S: Operating
Threads in Windows NT
Design driven by need to support a variety of
OS environments
NT process implemented as an object
executable process contains >= 1 thread
process and thread objects have built in
synchronization capabilitiesS
Fred Kuhns ( )
CS523S: Operating
NT Threads
Support for kernel (system) threads
Threads are scheduled by the kernel and thus are
similar to UNIX threads bound to an LWP
(kernel thread)
fibers are threads which are not scheduled by the
kernel and thus are similar to unbound user
threads.
Fred Kuhns ( )
CS523S: Operating
4.4 BSD UNIX

Initial support for threads implemented but not
enabled in distribution
Proc structure and u-area reorganized
All threads have a unique ID
How are the proc and u areas reorganized to
support threads?
Fred Kuhns ( )
CS523S: Operating

Multiprocessors and Threads: Fred Kuhns CS523S: Operating Systems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiprocessors and Threads: Fred Kuhns CS523S: Operating Systems

Uploaded by

Copyright:

Available Formats

Multiprocessors and Threads

CS523S: Operating Systems

Motivation for Multiprocessors

Fault Tolerance graceful degradation in face of failures

Loosely Coupled System - memory is

Memory Access Schemes

NonUniform Access (NUMA)

NO Remote Memory Access (NORMA)

Caching - Cache Coherence Problem!

SMP OS Design Issues

SMP OS design issues - 2

Reliability and fault Tolerance - degrade

Typical SMP System

Process and Threads

Thread: Dynamic object representing an

User Level Threads

Kernel Level Threads

Light Weight Processes (LWP)

LWP is scheduled by kernel

First Class threads (Psyche OS)

kernel threads implement a virtual processor

kernel is responsible for processor allocation =>

Support for Threading

Digital UNIX: extends MACH to provide usual

klwp_t can be swapped with LWP

Solaris LWP (cont)

Each may have its own

No global name space for LWPs

Solaris User Threads

Solaris provides two libraries: libthread and

Solaris Thread Data Structures

Solaris: Processes, Threads and LWPs

Solaris Signals and Fork

Processor sets - available processors divided into nonintersecting sets.

Mach c-thread Implementations

Digital UNIX threads

library includes signal handling, scheduling

Thread is first responsible for saving any required

4.4 BSD UNIX

You might also like