Professional Documents
Culture Documents
Distributed Systems: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
Distributed Systems: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
Operating System Concepts – 9th Edition Silberschatz, Galvin and Gagne ©2013
Distributed Systems
Advantages of Distributed Systems
Types of Network-Based Operating Systems
Network Structure
Communication Structure
Communication Protocols
An Example: TCP/IP
Robustness
Design Issues
Operating System Concepts – 9th Edition 17.2 Silberschatz, Galvin and Gagne ©2013
Overview of Distributed Systems
n Distributed system is collection of loosely coupled processors
interconnected by a communications network
n Processors variously called nodes, computers, machines, hosts
l Site is location of the processor
l Generally a server has a resource a client node at a different site
wants to use
Operating System Concepts – 9th Edition 17.3 Silberschatz, Galvin and Gagne ©2013
Example
Operating System Concepts – 9th Edition 17.4 Silberschatz, Galvin and Gagne ©2013
Reasons for Distributed Systems
Operating System Concepts – 9th Edition 17.5 Silberschatz, Galvin and Gagne ©2013
Reasons for Distributed Systems
Operating System Concepts – 9th Edition 17.6 Silberschatz, Galvin and Gagne ©2013
Types of Network-based Operating
Systems
Operating System Concepts – 9th Edition 17.7 Silberschatz, Galvin and Gagne ©2013
Network-Operating Systems
Users are aware of multiplicity of machines.
Access to resources of various machines is done explicitly by:
Remote logging into the appropriate remote machine (telnet)
User wishes to compute on "cs.yale.edu," a computer that is located
at Yale University.
telnet cs.yale.edu
Remote File Transfer: (FTP)
Suppose that a user on "cs.uvm.edu" wants to copy a Java program
Server. java that resides on "cs.yale.edu." The user must first invoke
the FTP program by executing
ftp cs.yale.edu
Users must establish a session, give network-based commands
More difficult for users
Operating System Concepts – 9th Edition 17.8 Silberschatz, Galvin and Gagne ©2013
Distributed-Operating Systems
Operating System Concepts – 9th Edition 17.9 Silberschatz, Galvin and Gagne ©2013
Distributed-Operating Systems
Operating System Concepts – 9th Edition 17.10 Silberschatz, Galvin and Gagne ©2013
Distributed-Operating Systems (Cont.)
Process Migration – execute an entire process, or parts of it, at
different sites
Load balancing – distribute processes across network to
even the workload
Computation speedup – subprocesses can run concurrently
on different sites
Hardware preference – process execution may require
specialized processor
Software preference – required software may be available at
only a particular site
Data access – run process remotely, rather than transfer all
data locally
Consider the World Wide Web
Operating System Concepts – 9th Edition 17.11 Silberschatz, Galvin and Gagne ©2013
Network Structure
Local-Area Network (LAN) – designed to cover small geographical
area ( Office, Single building, and adjacent buildings)
All the sites in such systems are close to one another, so the
communication links tend to have a higher speed and lower error
rate than do their counterparts in wide-area networks.
Speeds from 1Mb per second (Appletalk, bluetooth) to 40 Gbps for
fastest Ethernet over twisted pair copper or optical fibre (10BaseT,
100BaseT).
Operating System Concepts – 9th Edition 17.12 Silberschatz, Galvin and Gagne ©2013
Local-area Network
Operating System Concepts – 9th Edition 17.13 Silberschatz, Galvin and Gagne ©2013
Network Types (Cont.)
Wide-Area Network (WAN) –
The sites in a WAN are physically distributed over a large
geographical area, the communication links are, by default,
relatively slow and unreliable.
Typical links are telephone lines, leased (dedicated data) lines,
microwave links, and satellite channels.
This are Implemented via connection processors known as
routers
Internet WAN enables world wide hosts to communicate
Hosts differ in all dimensions but WAN allows
communications amongst them
Connections between networks frequently use a telephone-
system service called T1
T1 link is 1.544 Megabits per second
T3 is 28 x T1s = 45 Mbps
Operating System Concepts – 9th Edition 17.14 Silberschatz, Galvin and Gagne ©2013
Communication Processors in a Wide-Area Network
Operating System Concepts – 9th Edition 17.15 Silberschatz, Galvin and Gagne ©2013
Network Topology
Installation cost:
The cost of physically linking the sites in the system
Communication cost:
The cost in time and money to send a message from site A to
site B
Availability:
The extent to which data can be accessed despite the failure
of some links or sites
Operating System Concepts – 9th Edition 17.16 Silberschatz, Galvin and Gagne ©2013
Network Topology
Operating System Concepts – 9th Edition 17.17 Silberschatz, Galvin and Gagne ©2013
Network Topology
Operating System Concepts – 9th Edition 17.18 Silberschatz, Galvin and Gagne ©2013
Network Topology
In a ring network, at least two links must fail for partition to occur. Thus,
the ring network has a higher degree of availability than does a tree-
structured network. However, the communication cost is high, since a
message may have to cross a large number of links.
Operating System Concepts – 9th Edition 17.19 Silberschatz, Galvin and Gagne ©2013
Communication Structure
The design of a communication network must address four basic
issues:
Naming and name resolution - How do two processes
locate each other to communicate?
Routing strategies - How are messages sent through the
network?
Connection strategies - How do two processes send a
sequence of messages?
Contention - The network is a shared resource, so how do
we resolve conflicting demands for its use?
Operating System Concepts – 9th Edition 17.20 Silberschatz, Galvin and Gagne ©2013
Naming and Name Resolution
For a process at site A to exchange information with a process at site
B, each must be able to specify the other.
Operating System Concepts – 9th Edition 17.21 Silberschatz, Galvin and Gagne ©2013
Naming and Name Resolution
There must be a mechanism to resolve the host name into a host-id
that describes the destination system to the networking hardware.
First, every host may have a data file containing the names and
addresses of all the other hosts reachable on the network (similar to
binding at compile time).
The problem with this model is that adding or removing a host from
the network requires updating the data files on all the hosts.
Operating System Concepts – 9th Edition 17.22 Silberschatz, Galvin and Gagne ©2013
Naming and Name Resolution
The systems resolves the addresses by examining the host-name
components in reverse order. Each component has a name-server
(accepts name and returns address) Request by process on Site A to
communicate with bob.cs.brown.edu is resolved as:
1. The kernel of system A issues a request to the name server for
the edu domain, asking for the address of the name server for
brown.edu. The name server for the edu domain must be at a
known address, so that it can be queried.
2. The edu name server returns the address of the host on which the
brown.edu name server resides.
3. The kernel on system A then queries the name server at this
address and asks about cs.brown.edu.
4. An address is returned; and a request to that address for
bob.cs.brown.edu now, finally, returns an host-id for that host
(for example, 128.148.31.100).
Operating System Concepts – 9th Edition 17.23 Silberschatz, Galvin and Gagne ©2013
Routing Strategies
When a process at site A wants to communicate with a process at site
B, how is the message sent?
If there is only one physical path from A to B (a star or tree-
structured network), the message must be sent through that path.
However, if there are multiple physical paths from A to B, then
several routing options exist.
Operating System Concepts – 9th Edition 17.24 Silberschatz, Galvin and Gagne ©2013
Routing Strategies (Cont.)
Virtual routing- A path from A to B is fixed for the duration of one
session. Different sessions involving messages from A to B may have
different paths
Partial remedy to adapting to load changes
Ensures that messages will be delivered in the order in which they
were sent
Dynamic routing - The path used to send a message form site A to site
B is chosen only when a message is sent
Because the decision is made dynamically, separate messages may
be assigned different paths.
Site A will make a decision to send the message to site C; C, in turn,
will decide to send it to site D, and so on. Eventually, a site will deliver
the message to B. Usually, a site sends a message to another site on
whatever link is the least used at that particular time.
Messages may arrive out of order
This problem can be remedied by appending a sequence number
to each message
Most complex to set up
Operating System Concepts – 9th Edition 17.25 Silberschatz, Galvin and Gagne ©2013
Routing Strategies (Cont.)
Tradeoffs mean all methods are used
UNIX provides ability to mix fixed and dynamic
Hosts may have fixed routes and gateways connecting
networks together may have dynamic routes
Operating System Concepts – 9th Edition 17.26 Silberschatz, Galvin and Gagne ©2013
Routing Strategies (Cont.)
More recently, routing managed by intelligent software more
intelligently than routing protocols
OpenFlow is device-independent, allowing developers to
introduce network efficiencies by decoupling data-routing
decisions from underlying network devices
Messages vary in length – simplified design breaks them into
packets (or frames, or datagrams)
Connectionless message is just one packet
Otherwise need a connection to get a multi-packet message
from source to destination
Operating System Concepts – 9th Edition 17.27 Silberschatz, Galvin and Gagne ©2013
Connection Strategies
Circuit switching -
If two processes want to communicate, a permanent physical link is
established between them.
The is link is allocated for the duration of the communication
session, and no other process can use that link during this period
This scheme is similar to that used in the telephone system.
Message switching –
A temporary link is established for the duration of one message
transfer.
Physical links are allocated dynamically among correspondents as
needed and are allocated for only short periods.
This scheme is similar to the post-office mailing system. Each letter
is a message that contains both the destination address and source
(return) address. Many messages can be shipped over the same
link.
Operating System Concepts – 9th Edition 17.28 Silberschatz, Galvin and Gagne ©2013
Connection Strategies
Packet switching - Messages of variable length are divided into fixed-
length packets which are sent to the destination
Each packet may take a different path through the network
The packets must be reassembled into messages as they arrive
Circuit switching requires setup time, but incurs less overhead for
shipping each message, and may waste network bandwidth
Message and packet switching require less setup time, but incur more
overhead per message
Operating System Concepts – 9th Edition 17.29 Silberschatz, Galvin and Gagne ©2013
Contention
Depending on the network topology, a link may connect more than two
sites in the computer network, and several of these sites may want to
transmit information over a link simultaneously (Ring).
Operating System Concepts – 9th Edition 17.30 Silberschatz, Galvin and Gagne ©2013
Contention
Token passing (Ring):
A site that wants to transmit information must wait until the token
arrives.
If the token gets lost, the system must detect the loss and generate a
new token. It usually does that by declaring an to choose a unique
site where a new token will be generated.
Operating System Concepts – 9th Edition 17.31 Silberschatz, Galvin and Gagne ©2013
Communication Protocol
Layers 1-4 are considered as the lower layers (with moving data around)
Operating System Concepts – 9th Edition 17.32 Silberschatz, Galvin and Gagne ©2013
Communication Protocol
Operating System Concepts – 9th Edition 17.33 Silberschatz, Galvin and Gagne ©2013
Communication Protocol
Layer 1: Physical layer – handles the mechanical and electrical details of
the physical transmission of a bit stream. It provides the hardware means
of sending and receiving data on a carrier, including defining cables, cards
and physical aspects.
Ex: Fast Ethernet, RS232 are protocols with physical layer
components.
Operating System Concepts – 9th Edition 17.34 Silberschatz, Galvin and Gagne ©2013
Communication Protocol (Cont.)
Layer 3: Network layer –
provides connections and routes packets in the communication network,
including handling the address of outgoing packets, decoding the address
of incoming packets, and maintaining routing information for proper
response to changing load levels.
Operating System Concepts – 9th Edition 17.35 Silberschatz, Galvin and Gagne ©2013
Communication Protocol (Cont.)
Layer 6: Presentation layer –
The presentation layer works to transform data into the form that
the application layer can accept.
Resolves the differences in formats among the various sites in
the network, including character conversions, and half
duplex/full duplex (echoing).
Operating System Concepts – 9th Edition 17.36 Silberschatz, Galvin and Gagne ©2013
Communication Via ISO Network Model
Operating System Concepts – 9th Edition 17.37 Silberschatz, Galvin and Gagne ©2013
The ISO Protocol Layer
Operating System Concepts – 9th Edition 17.38 Silberschatz, Galvin and Gagne ©2013
The ISO Network Message
Operating System Concepts – 9th Edition 17.39 Silberschatz, Galvin and Gagne ©2013
The TCP/IP Protocol Layers
Operating System Concepts – 9th Edition 17.40 Silberschatz, Galvin and Gagne ©2013
Example: TCP/IP
The transmission of a network packet between hosts on an
Ethernet network
Every host has a unique IP address and a corresponding
Ethernet Media Access Control (MAC) address
Communication requires both addresses
Domain Name Service (DNS) can be used to acquire IP addresses
If the hosts are on different networks, the sending host will send
the packet to a router which routes the packet to the destination
network
Operating System Concepts – 9th Edition 17.41 Silberschatz, Galvin and Gagne ©2013
Example: TCP/IP
Operating System Concepts – 9th Edition 17.42 Silberschatz, Galvin and Gagne ©2013
Example: TCP/IP
Operating System Concepts – 9th Edition 17.43 Silberschatz, Galvin and Gagne ©2013
Robustness
Failure detection
The failure of a link,
The failure of a site, and
The loss of a message
To ensure that the robustness of the system, these failures must
be detected.
The systems should be reconfigured so that the computation can
continue, and recover when a site or a link is repaired.
Reconfiguration
Operating System Concepts – 9th Edition 17.44 Silberschatz, Galvin and Gagne ©2013
Failure Detection
Detecting these failure is difficult due to no shared memory concept
If Site A does not receive a message within the predetermined time period,
then it can assume that
the site B has failed (site failure),
the link between A and B has failed (link failure),
the message from B has been lost (message loss).
Operating System Concepts – 9th Edition 17.45 Silberschatz, Galvin and Gagne ©2013
Failure Detection
If Site A does not receive a reply, it can repeat the message or try
an alternate route to Site B
Operating System Concepts – 9th Edition 17.46 Silberschatz, Galvin and Gagne ©2013
Failure Detection (Cont.)
Site A can try to differentiate between link failure and site failure by sending
an Are-you-up? message to B by another route.
If Site A does not ultimately receive a reply from Site B, it concludes some
type of failure has occurred.
Site B is down
The direct link between A and B is down
The alternate link from A to B is down
The message has been lost
However, Site A cannot determine exactly why the failure has occurred
Operating System Concepts – 9th Edition 17.47 Silberschatz, Galvin and Gagne ©2013
Reconfiguration
When Site A determines a failure has occurred, it must reconfigure the
system:
2. If a site has failed, every other site must also be notified indicating
that the services offered by the failed site are no longer available.
The failure of a site that serves as a central coordinator for some
activity (such as deadlock detection) requires the election of a new
coordinator.
When the link or the site becomes available again, this information
must again be broadcast to all other sites
Operating System Concepts – 9th Edition 17.48 Silberschatz, Galvin and Gagne ©2013
Recovery from Failure
When the link or the site becomes available again, this information must
again be broadcast to all other sites
Suppose that site B has failed. When it recovers, it must notify all
other sites that it is up again. It may have to receive information from
the other sites to update its local tables; for example, it may need
routing-table information, a list of sites that are down, or undelivered
messages.
Operating System Concepts – 9th Edition 17.49 Silberschatz, Galvin and Gagne ©2013
Fault Tolerance
Fault tolerance is the dynamic method that’s used to keep the interconnected
systems together, sustain reliability, and availability in distributed systems.
Operating System Concepts – 9th Edition 17.50 Silberschatz, Galvin and Gagne ©2013
Fault Tolerance
Operating System Concepts – 9th Edition 17.51 Silberschatz, Galvin and Gagne ©2013
Fault Tolerance
In general, the larger is the number of copies kept, the better is the reliability
but the larger is the system overhead involved.
Operating System Concepts – 9th Edition 17.52 Silberschatz, Galvin and Gagne ©2013
Veritas cluster
Operating System Concepts – 9th Edition 17.53 Silberschatz, Galvin and Gagne ©2013
Design Issues
Operating System Concepts – 9th Edition 17.54 Silberschatz, Galvin and Gagne ©2013
Design Issues
Scalability –
Systems have bounded resources and can become completely saturated
under increased load.
as demands increase, the system should easily accept the addition of
new resources to accommodate the increased demand
adding machines to a distributed system can clog the network and
increase service loads and can call for expensive design modifications.
In a distributed system, the ability to scale up is of special importance,
since expanding the network by adding new machines or interconnecting
two networks is commonplace.
Scalability is related to fault tolerance. A heavily loaded component can
become paralyzed and behave like a faulty component.
Generally, having spare resources is essential for ensuring reliability as
well as for handling peak loads gracefully.
An inherent advantage of a distributed system is a potential for fault
tolerance and scalability because of the multiplicity of resources.
Operating System Concepts – 9th Edition 17.55 Silberschatz, Galvin and Gagne ©2013
Design Issues
Operating System Concepts – 9th Edition 17.56 Silberschatz, Galvin and Gagne ©2013
End of Module 1
Operating System Concepts – 9th Edition Silberschatz, Galvin and Gagne ©2013