Chapter 3 Processes

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 42

Distributed Systems

(COSC 621)

Tessfu G. (Ph.D.)
School of Computing
Department of Computer Science
Dire Dawa Institute of Technology
Chapter 3 - Processes

2
Introduction to Threads
• Processor – Provides a set of instructions along with the capability of
automatically executing a series of those instructions.

• Thread – A minimal software processor in whose context a series of


instructions can be executed.
– Can be seen as execution of (a part of) a program on a virtual
processor.
– Analogous to the concept of a “procedure” that runs independently
from its main program.
– Useful to structure large applications into parts that could logically be
executed at the same time.
• Process – A software processor in whose context one or more threads may be
executed.
– A program that is currently being executed on one of the OS’s virtual
processors.
3
Introduction to Threads
• Management and scheduling of processes are the most important issues in
operating systems.

• In addition, multithreading and process (code) migration are also important


issues in distributed systems.
• Multithreading – useful to efficiently organize client server systems.
– useful to enhance performance of the distributed systems.

• Code migration – helps to achieve scalability.


– helps to dynamically configure clients and servers.

4
Introduction to Threads: Contexts
• Processor Context – The minimal collection of values stored in the registers of
a processor used for the execution of a series of
instructions (e.g., stack pointer, addressing registers,
program counter).
• Thread Context – The minimal collection of values stored in registers and
memory, used for the execution of a series of instructions
(processor context, and information for thread management).
– Thread context switching can be done entirely independent
of the OS as threads share the same address space.

• Process Context – The minimal collection of values stored in registers and


memory, used for the execution of a thread (thread context,
and register values of memory management unit (MMU)).
– Process switching, which is implemented by means of
interprocess communication (IPC), is generally more
expensive as OS is involved.
5
Introduction to Threads: Contexts

Context switching as the result of IPC

6
Introduction to Threads: Concurrency Transparency
Processes
•Concurrency transparency of multiple processes comes at a high price:
• Each time a process is created, the OS must create independent address space.
• Process context switching for processes is expensive.
• In case of excess processes, swapping of processes between the main memory
and disk is made.
Threads
•No attempt is made to achieve concurrency transparency of multiple threads if
this would result in performance degradation.
• Only the minimum information to allow the processor to be shared by several
threads is maintained.
• Protecting data against inappropriate access by threads within a single process is
left entirely to application developers.
•Implications:
• Performance gain by multithreading.
• Unlike the case of multiprocesses (which are automatically created each other),
multithreaded applications require some additional intellectual effort.
7
Introduction to Threads: Multithreading
Advantages of multithreading:

•No need to block with every system call.


• In a single-threaded process, whenever a blocking system call is executed, the
process as a whole is blocked. E.g., a spreadsheet cell will be unable to process
cell-dependent computations while the program is waiting for input.
•Easy to exploit available parallelism in multiprocessors
• It becomes possible to exploit parallelism when executing a program on a
multiprocessor system.
- Each thread is assigned to a different CPU.
- Shared data are stored in shared main memory.
•Cheaper communication between components than with IPC
• In stead of using processes, large applications can be constructed such that
different parts are executed by separate threads.
- Communication between those parts is dealt with by using shared data.
- Thread switching can sometimes be done entirely in user space.

8
Introduction to Threads: Multithreading
Advantages of multithreading:

•Software engineering expressive power


• From software engineering perspective, many applications are simply easier to
structure as a collection of cooperating threads.
- For example, in the case of word processor, separate threads can be used for
handling user input, spelling and grammar checking, etc.

9
Thread Implementation
Thread Package
•Threads are generally provided in the form of a thread package.
•Thread package contains:
• Operations to create and destroy threads.
• Operations on synchronization variables (mutexes and conditional variables).
•Three approaches to thread package implementation:
• User-Level Solution
• Kernel-Level Solution
• Lightweight Process (LWP) Solution

User-Level Solution:
•All operations can be completely handled within a single process.
•All services provided by the kernel are done on behalf of the process in which a
thread resides.
•Pros:
• Cheap to create and destroy threads.
• Fast thread context switching.
•Con:
• If the kernel decides to block a thread, then entire process will be blocked. 10
Thread Implementation
Kernel-Level Solution:
•The whole idea is to have the kernel contain the implementation of a thread
package.
•All operations return as system calls
•Pros:
• Operations that block a thread are no longer a problem: the kernel schedules
another available thread within the same process.
• External events are simple: the kernel (which catches all events) schedules the
thread associated with the event.
•Con:
• Expensive:
- Every thread operation (creation, deletion, synchronization, etc.) will
have to be carried out by the kernel.
- Thread context switching may become as expensive as process context
switching.
•Another Solution:
• Mixing user-level and kernel-level threads into a single concept. 11
Thread Implementation
Lightweight Process (LWP) Solution:
•A hybrid form of user-level and LWPs (kernel-level threads).
•LWPs run in the context of a single (heavy-weight) process, and there can be
several LWPs per process.
•Principal operation:
• When a user-level thread does a system call, the LWP executing that thread
blocks. Thread is still bound to LWP.
• The kernel can schedule another LWP having a runnable thread bound to it.
• A Thread can switch to any other runnable thread currently in user space.
• When there are no threads to schedule, an LWP may remain idle, and may even
be removed (destroyed) by the kernel.

12
Thread Implementation
Lightweight Process (LWP) Solution:

Combining kernel-level lightweight processes and user-level


threads

13
Thread Implementation
Pros and Cons of LWPs
•Pros:
• Thread operations require no kernel intervention.
• A blocking system call will not suspend the entire process (as far as the process
has enough LWPs).
• There is no need for an application to know about LWPs (all it sees are user-level
threads).
• Different LWPs can be executed on different CPUs (multiprocessing
environment).
• Con:
• LWPs need to be created and destroyed (as with kernel-level threads).

14
Threads in Distributed Systems: Multithreaded Clients
• Main issue is hiding network latency.
• Examples:
• Multithreaded web clients
- Browser scans HTML, and finds more files that need to be fetched.
- Each file is fetched by a separate thread, each issuing an HTTP request.
- As files come in, the browser displays them.
• Multiple request-response calls to other machines (RPC)
- A client does several calls at the same time, each one by a different thread.
- It then waits until all results have been returned.
- If calls are to different servers, we may have a linear speed-up.

15
Threads in Distributed Systems: Multithreaded Servers
• An important use of multithreading in distributed systems is at the server side.
• Main issues are:
• Improving performance
- Starting a thread to handle an incoming request is much cheaper than starting
a
process.
- Exploits parallelism to attain high performance. Single-threaded server can’t
take advantage of multiprocessor.
- Hides network latency. Other work can be done while a request is coming in.
• Better structure
- Most servers have high I/O demands. Using simple, well-understood
blocking
calls simplifies the overall structure.
- Multithreaded programs tend to be smaller and easier to understand due to
simplified flow of control.

16
Threads in Distributed Systems: Multithreaded Servers
Multithreaded Server Organization
•A popular multithreaded server organization is dispatcher/worker model.
• Dispatcher thread - reads incoming requests.
• Worker thread - is selected by the server to process a request.

A multithreaded server organized in a dispatcher/worker


model 17
Threads in Distributed Systems: Finite State Machines
• A state machine consists of
• state variables, which encode its state
• commands, which transform its state
• Each command is implemented by a deterministic program
• execution of the command is atomic with respect to other commands
• modifies the state variables and/or produces some output
• Operation:
• Using a single process that awaits messages containing requests and performs the
actions they specify, as in a server.
- Read any available input.
- Process the input chunk.
- Save state of that request.
- Loop.

18
Threads in Distributed Systems: Finite State Machines
• Three ways to construct a server...

Server Parallelism Performanc System Processing Programming


Model e Calls
Multithreaded Parallel High Blocking Sequential Easy
Single- Non-parallel Low Blocking Sequential Easy
threaded
Finite-state Parallel High Non- Non- Hard
machine blocking sequential

19
Anatomy of Clients
• The two issues in the anatomy of clients are:
• User interfaces
- Use to create a convenient environment for the interaction of a human user
and a remote server.
Example: Mobile phones with simple displays and a set of keys
- GUIs are most commonly used user interfaces
- The X Window System (or simply X)

• Client-side software for distribution transparency

20
User interfaces: The X Window System
• The X Window System (commonly referred to as X) is a network-transparent
graphical windowing system based on a client/server model.
• Primarily used on UNIX and UNIX-like systems such as Linux
• Versions of X are now also available for many other operating systems.
• Although it was developed in 1984, X is not only still available but also is in fact
the standard environment for UNIX windowing systems.

• Part of the OS and used to control the terminal (keyboard, mouse, monitor,
etc.).
• The heart of the system is formed by X kernel (X server).
• The interface is made available to applications as a library called Xlib.
• Programs that make use of the features of X kernel are X clients.
• The client and the server communicate via the X Protocol.
• Window Manager manipulates the entire screen.
• Other examples of similar systems: Remote desktop

21
User interfaces: The X Window System
Basic organization of the X Window System

22
User interfaces: Compound Documents
• Modern graphical user interfaces.
• User interface is application aware inter-application communication.
• Drag-and-drop: move objects across the screen to invoke interaction with
other
applications.
• In-place editing: integrate several applications at user-interface level (word
processing + drawing facilities)

23
Client-Side Software for Distribution Transparency
• Often tailored for distribution transparency.
• Access transparency: client-side stubs for machine architectures and
communications.
• Location/migration transparency: Let client-side software keep track of
actual
location.
• Failure transparency: Can often be placed only at client. E.g., using caches
in
browsing.
• Replication transparency: Multiple invocations handled by client-side stub.

Transparent replication of a server using a client-side solution 24


Servers and design issues
• The general Design Issues that must consider in the distributed systems are:
• How to organize servers?
• Where do clients contact a server?
• Whether and how a server can be interrupted?
• Whether or not the server is stateless?

25
General Organization
• A server is a process that waits for incoming service requests at a specific
transport address.
• Clients send requests to an endpoint, also called a port.
• In practice, there is a one-to-one mapping between a port and a service.
• 20 ftp-data File Transfer [Default Data]
• 21 ftp File Transfer [Control]
• 23 telnet Telnet
• 25 smtp Simple Mail Transfer Protocol
• 53 dns Domain Name System
• 80 http Hypertext Transfer Protocol

26
General Organization
Types of Servers
•Iterative vs. Concurrent Server: Iterative server itself handles only one
request at a time whereas concurrent server passes the request to a separate
thread or another process and waits for another request.

•Super servers: - Servers that listen to several ports, i.e., provide several
independent services.
- When a service request comes in, they start a subprocess to
handle the request.
- For services with more permanent traffic, get a dedicated
server.

27
Out-of-Band Communication
• How can a server be interrupted once it has accepted (or is in the process of
accepting) a service request?

• Solution 1: Use a separate port for urgent data (possibly per service request):
• Server has a separate thread (or process) waiting for incoming urgent messages.
• When urgent message comes in, associated request is put on hold.
• Note: we require OS supports high-priority scheduling of specific threads or
processes.

• Solution 2: Use out-of-band communication facilities of the transport layer:


• Example, TCP allows to send urgent messages in the same connection.
• Urgent messages can be caught using OS signaling techniques.

28
Servers and State
Stateless Servers
• Never keep accurate information about the status of a client after having
handled a request:
• Don’t record whether a file has been opened (simply close it again after access)
• Don’t promise to invalidate a client’s cache
• Don’t keep track of your clients

•Consequences:
• Clients and servers are completely independent
• State inconsistencies due to client or server crashes are reduced
• Possible loss of performance because, e.g., a server cannot anticipate client
behavior (think of prefetching file blocks).

•Example: a Web server

29
Servers and State
Stateful Servers
•Keeps track of the status of its clients:
• Record that a file has been opened, so that prefetching can be done
• Knows which data a client has cached, and allows clients to keep local copies of
shared data
• Server may record client’s behavior using cookies

•Observation:
• The performance of stateful servers can be extremely high, provided clients are
allowed to keep local copies.
• Session state vs. permanent state
• As it turns out, reliability is not a major problem.

30
Server Clusters
• Many server clusters are organized along three different tiers, to improve
performance.
• The first tier is generally responsible for passing requests to an appropriate
server.
• The second and the third tiers can be merged.

31
Server Clusters
Request Handling
•Having the first tier handle all communication from/to the cluster may lead to a
bottleneck.
•Solution: Various, but one popular one is TCP-handoff:

32
Code Migration: Approaches to Code Migration
• A process consists of three segments
• Code segment: contains the actual code
• Resource segment: contains references to external resources needed by the
process. E.g., files, printers, devices, other processes
• Execution segment: stores the current execution state of a process, consisting
of private data, the stack, and the program counter
• What if operation we want (e.g., moving data) isn’t offered remotely?
• Solution: Code Migration / Agents
• Traditionally, code is moved along with the whole computational context
• Why code migration is useful?
• Load balancing
• Minimizing communication costs
• Optimizing perceived performance
• Improving scalability
• Flexibility through dynamic configurability

33
Code Migration: Approaches to Code Migration
Dynamic Client Configuration
•The client first fetches the necessary software, and then invokes the server.

The principle of dynamically configuring a client to communicate to


a server
34
Code Migration: Approaches to Code Migration
Alternatives of Code Migrations
Mobility
• Weak Mobility: Moves only code, plus maybe some init data (and start execution
from the beginning after migration).
- The mobile code may be executed on either the target process or a separate
process. - E.g., Java Applets.

• Strong Mobility: Moves code and execution (and resume execution where it
stopped).
- Migration: move the entire object from one machine to the other.
- Cloning: simply start a clone, and set it in the same execution state.

• Initiation
• Sender-initiated migration: Migration is initiated where the code resides / is being
currently executed.
• Receiver-initiated migration: Migration is initiated by the target machine.

35
Code Migration: Approaches to Code Migration
Alternatives of Code Migrations

36
Code Migration: Migration and Local Resources

• So far, we have only accounted for migration of the code and execution
segments.
• A process uses local resources that may or may not be available at the target
site.
• Either references need to be updated, or resources need to be moved.
• Resources might not be as easy to move around as code and variables.
Example: A huge database might in theory be moved across the network, but in
practice it will not.
• Two issues:
•How does the resource segment refer to resources?
- Process-to-resource binding
•How does the resource relate with the hosting machine?
- Resource-to-machine binding

37
Code Migration: Migration and Local Resources

Process-to-Resource Binding
•Binding by identifier: the process refers to a resource by its identifier (e.g., a
URL). => Bind to the same resource
•Binding by value: the object requires the value of a resource (e.g., standard
library). => Bind to equivalent resource
•Binding by type: the object requires a type of resource (e.g., local devices, such
as monitors, printers, and so on). => Bind to resource with
same function.

38
Code Migration: Migration and Local Resources

Resource-to-Machine Binding
•Unattached: Resources that can be easily moved between different machines.
E.g., data files and cache.
•Fastened: Resources that can be moved, but only at a high cost.
E.g., local databases and complete Web sites.
•Fixed: Resources bounded to a specific machine.
E,g., a local communication port and monitor.
Actions to be Taken...
• Actions to be taken with respect to the references to local
resources when migrating code to another machine
Resource-to-machine binding
Unattached Fastened Fixed
Process-to- By identifier MV (or GR) GR (or MV) GR
resource By value CP (or MV, GR) GR (or CP) GR
binding
By type RB (or GR, CP) RB (or GR, CP) RB (or GR)
39
Code Migration: Migration and Local Resources

 GR: Establish a global system wide reference


 MV: Move the resource
 CP: Copy the value of the resource
 RB: Rebind process to a locally available resource

40
Code Migration: Migration in Heterogeneous Systems

• Main problem:
• The target machine may not be suitable to execute the migrated code.
• The definition of process/thread/processor context is highly dependent on local
hardware, operating system and runtime system.

• Solution:
• Make use of an abstract machine that is implemented on different platforms.
- Interpreted languages running on a virtual machine E.g., Java/JVM,
scripting languages
- Virtual machine monitors allowing migration of complete OS + apps.

41
Next: Communication

42

You might also like