TT2 QB Solutions

Chapter 3
1. Election Algorithms
Election algorithms choose a process from group of processors to act as a coordinator. If
the coordinator process crashes due to some reasons, then a new coordinator is elected
on other processor. Election algorithm basically determines where a new copy of
coordinator should be restarted.
Election algorithm assumes that every active process in the system has a unique priority
number. The process with highest priority will be chosen as a new coordinator. Hence,
when a coordinator fails, this algorithm elects that active process which has highest
priority number. Then this number is send to every active process in the distributed
system.
We have two election algorithms for two different configurations of distributed system.
1. The Bully Algorithm
This algorithm applies to system where every process can send a message to every other
process in the system.
Algorithm: Suppose process P sends a message to the coordinator.
1. If coordinator does not respond to it within a time interval T, then it is assumed that
coordinator has failed.
2. Now process P sends election message to every process with high priority number.
3. It waits for responses, if no one responds for time interval T then process P elects
itself as a coordinator.
4. Then it sends a message to all lower priority number processes that it is elected as
their new coordinator.
5. However, if an answer is received within time T from any other process Q,
(I) Process P again waits for time interval T’ to receive another
message from Q that it has been elected as coordinator.
(II) (II) If Q doesn’t respond within time interval T’ then it is assumed
to have failed and algorithm is restarted.
Example: We start with 6 processes, all directly connected to each other. Process 6 is
the leader, as it has the highest number.
Process 6 fails.
Process 3 notices that Process 6 does not respond. So it starts an election, notifying
those processes with ids greater than 3.
Both Process 4 and Process 5 respond, telling Process 3 that they'll take over from
here.
Process 4 sends election messages to both Process 5 and Process 6.

Only Process 5 answers and takes over the election.
Process 5 sends out only one election message to Process 6.
When Process 6 does not respond.

Process 5 declares itself the winner.
2. The Ring Algorithm
This algorithm applies to systems organized as a ring(logically or physically). In this
algorithm we assume that the link between the process are unidirectional and every
process can message to the process on its right only. Data structure that this algorithm
uses is active list, a list that has priority number of all active processes in the system.
Algorithm:
1. If process P1 detects a coordinator failure, it creates new active list which is empty
initially. It sends election message to its neighbour on right and adds number 1 to its
active list.
2. If process P2 receives message elect from processes on left, it responds in 3 ways:
(I) If message received does not contain 1 in active list then P1 adds 2 to its
active list and forwards the message.
(II) If this is the first election message it has received or sent, P1 creates new
active list with numbers 1 and 2. It then sends election message 1 followed by
2.
(III) If Process P1 receives its own election message 1 then active list for P1
now contains numbers of all the active processes in the system. Now Process
P1 detects highest priority number from list and elects it as the new
coordinator.
Lamport's Algorithm
Lamport was the first to give a distributed mutual exclusion algorithm as an

illustration of his clock synchronization scheme. Let Ri be the request set of site
Si , i.e. the set of sites from which Si needs permission when it wants to enter
CS. In Lamport's algorithm, i : 1  i  N :: Ri= {S1, S2,...,SN}. Every site Si
keeps a queue, request_queuei, which contains mutual exclusion requests
ordered by their timestamps. This algorithm requires messages to be delivered in
the FIFO order between every pair of sites.
The Algorithm
Requesting the critical section.

1. When a site Si wants to enter the CS, it sends REQUEST(tsi, i) message to all
the sites in its request set Ri and places the request on request_queuei (tsi is
the timestamp of the request).
2. When a site Sj receives the REQUEST(tsi, i) message from site Si, it returns
a timestamped REPLY message to Si and places site S'is request on
request_queuej.
Executing the critical section.

1. Site Si enters the CS when the two following conditions hold:
a) [L1:] Si has received a message with timestamp larger than (tsi, i)
from all other sites.
b) [L2:] S'is request is at the top request_queuei.
Releasing the critical section.

1. Site Si, upon exiting the CS, removes its request from the top of its request
queue and sends a timestamped RELEASE message to all the sites in its
request set.
2. When a site Sj receives a RELEASE message from site Si, it removes S'is
request from its request queue.
When a site removes a request from its request queue, its own request
may come at the top of the queue, enabling it to enter CS. The algorithm
executes CS requests in the increasing order of timestamps.
The Ricart-Agrawala Algorithm
The Ricart-Agrawala algorithm is an optimization of Lamport's algorithm that

dispenses with RELEASE messages by cleverly merging them with REPLY
messages. In this algorithm also, i : 1  i  N : Ri = {S1,S2,...,SN}.
The Algorithm

1. When a site Si wants to enter the CS, it sends a timestamped REQUEST
message to all the sites in its request set.
2. When site Sj receives a REQUEST message from site Si, it sends a REPLY
message to site Si if site Sj is neither requesting nor executing the CS or if
site Sj is requesting and S'is request's timestamp is smaller than Sj's own
request's timestamp. The request is deferred otherwise.
Executing the critical section

1. Site Si enters the CS after it has received REPLY messages from all the sites
in its request set.
Releasing the critical section
1. When site Si exits the CS, it sends REPLY messages to all the deferred
requests.
A site's REPLY messages are blocked only by sites that are requesting the CS
with higher priority (i.e., a smaller timestamp). Thus, when a site sends out
REPLY messages to all the deferred requests, the site with the next highest
priority request receives the last needed REPLY message and enters the CS. The
execution of CS requests in this algorithm is always in the order of their
timestamps.
Maekawa's Algorithm
The Construction of request sets. The request sets for sites in Maekawa's
algorithm are constructed to satisfy the following conditions:
M1: (i j : ij, 1  i, j  N :: Ri  Rj   ).
M2: ((i : 1  i  N:: Si  Ri)
M3: (((i : 1  i  N:: |Ri | = K)
M4: Any site Sj is contained in K number of Ris, 1  i, j  N.
The Algorithm

1. A site Si requests access to the CS by sending REQUEST(i) messages to all the
sites in its request set Ri.
2. When a site Sj receives the REQUEST(i) message, it sends a REPLY(j) message
to Si provided it hasn't sent a REPLY message to a site from the time it
received the last RELEASE message. Otherwise, it queues up the REQUEST
for later consideration.

1. Site Si accesses the CS only after receiving REPLY messages from all the
sites in Ri.
Releasing the critical section.

1. After the execution of the CS is over, site Si sends RELEASE(i) message to all
the sites in Ri.
2. When a site Sj receives a RELEASE(i) message from site Si, it sends a REPLY
message to the next site waiting in the queue and deletes that entry from
the queue. If the queue is empty, then site updates its state to reflect that
the site has not sent out any REPLY message.
Suzuki-Kasami's broadcast algorithm
The Algorithm
Requesting the critical section

1. If the requesting site Si does not have the token, then it increments its
sequence number, RNi[i], and sends a REQUEST(i, sn) message to all other
sites (sn is the updated value of RNi[i]).
2. When a site Sj receives this message, it sets RNj[i] to max(RNj[i], sn). If Sj
has the idle token, then it sends the token to Si if RNj[i]=LN[i]+1.

1. Site Si executes the CS when it has received the token.
Releasing the critical section. Having finished the execution of the CS, site Si
takes the following actions:
1. It sets LN[i] element of the token array equal to RNi[i].
2. For every site Sj whose ID is not in the token queue, it appends its ID to
the token queue if RNi[j]=LN[j]+1.
3. If token queue is nonempty after the above update, then it deletes the top
site ID from the queue and sends the token to the site indicated by the ID.
Raymond's Tree-Based Algorithm
The Algorithm

1. When a site wants to enter the CS, it sends a REQUEST message to the
node along the directed path to the root, provided it does not hold the
token and its request_q is empty. It then adds its request to its request_q.
(Note that a nonempty request_q at a site indicates that the site has sent
a REQUEST message to the root node for the top entry in its request_q).
2. When a site recives a REQUEST message, it places the REQUEST in its
request_q and sends a REQUEST message along the directed path to the
root provided it has not sent out a REQUEST message on its outgoing
edge (for a previously received REQUEST on its request_q).
3. When the root site receives a REQUEST message, it sends the token to the
site from which it received the REQUEST message and sets its holder
variable to point at that site.
4. When a site receives the token, it deletes the top entry from its
request_q, sends the token to the site indicated in this entry, and sets its
holder variable to point at that site. If the request_q is nonempty at this
point, then the site sends a REQUEST message to the site which is pointed
at by holder variable.

1. A site enters the CS when it receives the token and its own entry is at the
top of its request_q. In this case, the site deletes the top entry from its
request_q and enters the CS.
Releasing the critical section. After a site has finished execution of the CS,
it takes the following actions:
1. If its request_q is nonempty, then it deletes the top entry from its
request_q, sends the token to that site, and sets its holder variable to
point at that site.
2. If the request_q is nonempty at this point, then the site sends a
REQUEST message to the site which is pointed at by the holder variable.
Chapter 4
1. Desirable Features of global Scheduling algorithm
The features of Global Scheduling Algorithm are as follows;
 No apriori knowledge about the processes: The working of scheduling
algorithm is based on the information about the characteristics and resource
requirements of the processes. These pose extra burden on the users who must
provide this information while submitting their processes for execution. No
such information is required for global scheduling algorithm.
 Dynamic in Nature: The decision regarding the assignment of process should
be dynamic, which means that it should be based on the current load of the
system and not on some static policy. The flexibility to migrate the process
more than once should be the property of algorithm. The process should be
placed on a particular node which can be changed afterwards, based on the
initial decision in order to adapt to the change in system load.
 Decision - making capabilities: If we think about the heuristic methods, they
require less computational efforts that result in less time requirement for the
output. This will provide a near optimal result and has decision - making
capability.
 Balancing System performance and Scheduling overhead: Here we require
algorithm that will provide us near optimal system performance. It is desirable
to collect minimum of global state information, such as CPU load. Such
information is crucial because as the amount of global state information
collected increases, the overhead also increases. This will have an impact on
the result of the cost of gathering and processing the extra information, so
there is a need to improve the system performance by minimizing scheduling
overhead.
 Stability: Processors thrashing (due to the fruitless migration of processes)
must be prevented. For example, if nodes n1 and n2 are loaded with processes
and it is observed that node n3 is idle, then we can offload a portion of their
work to n3 without being aware of the offloading decision made by some of
the other nodes. In case n3 become overloaded due to this, it may again start
transferring its processes to other nodes. The main reason for this is that
scheduling decisions are being made at each node independently of decisions
made by other nodes.
 Scalability: he scheduling algorithm should be able to scale as the number of
nodes increases. An algorithm can be termed as having poor scalability if it
makes a scheduling decisions by first inquiring the workload from all the
nodes and then selecting the most lightly loaded node. This concept will work
fine only when we have few nodes in the system. This will happen because the
inquirer receives a flood of replies simultaneously, and the time required to
process the reply messages for making a node selection is too long. As the
number of nodes (N) increases, the network traffic consumes network
bandwidth quickly.
 Fault Tolerance: In case one or more nodes of the system crash, good
scheduling algorithm should not be disabled by this and mechanism to handle
this should be available. The algorithm should be capable of functioning
properly within the nodes if it is observed that the nodes are partitioned into
two or more groups due to link failures. In order to have better fault tolerance
capability, algorithms should decentralize the decision - making capability and
must consider only available nodes in their decision - making and have better
fault tolerance capability.
 Fairness of Service: If the global scheduling policy blindly attempts to
balance the load on all the nodes of the system, then they are not good if
viewed in terms of fairness of service. This is because in any load - balancing
scheme, heavily nodes will obtain all benefits while lightly loaded nodes will
suffer poor response time than in a standalone configuration. Thus, we say that
load balancing needs should be replaced by load sharing. That is, a node will
share some of its resources as long as it users are not significantly affected.
Task assignment approach

Assumption:
1. A process has already been split up into pieces called tasks. This split occurs along
natural boundaries (such as a method), so that each task will have integrity in itself
and data transfers among the tasks are minimized.
2. The amount of computation required by each task and the speed of each CPU are
known.
3. The cost of processing each task on every node is known. This is derived from
assumption 2.
4. The IPC costs between every pair of tasks is known. The IPC cost is 0 for tasks
assigned to the same node. This is usually estimated by an analysis of the static
program. If two tasks communicate n times and the average time for each inter-task
communication is t, then IPC costs for the two tasks is n * t.
5. Precedence relationships among the tasks are known.
6. Reassignment of tasks is not possible.
Goal is to assign the tasks of a process to the nodes of a distributed system in such a manner
as to achieve goals such as the following goals:
◦ Minimization of IPC costs
◦ Quick turnaround time for the complete process
◦ A high degree of parallelism
◦ Efficient utilization of system resources in general
These goals often conflict. E.g., while minimizing IPC costs tends to assign all tasks of a
process to a single node, efficient utilization of system resources tries to distribute the tasks
evenly among the nodes. So also, quick turnaround time and a high degree of parallelism
encourage parallel execution of the tasks, the precedence relationship among the tasks limits
their parallel execution.
Also note that in case of m tasks and q nodes, there are mq possible assignments of tasks to
nodes. In practice, however, the actual number of possible assignments of tasks to nodes may
be less than mq due to the restriction that certain tasks cannot be assigned to certain nodes due
to their specific requirements (e.g. need a certain amount of memory or a certain data file).
Task assignment example
Load – Balancing Approach
In this, the processes are distributed among nodes to equalize the load among all nodes. The
scheduling algorithms that use this approach are known as Load Balancing or Load Leveling
Algorithms. These algorithms are based on the intuition that for better resource utilization, it
is desirable for the load in a distributed system to be balanced evenly. This a load balancing
algorithm tries to balance the total system load by transparently transferring the workload
from heavily loaded nodes to lightly loaded nodes in an attempt to ensure good overall
performance relative to some specific metric of system performance.
We can have the following categories of load balancing algorithms:
 Static: Ignore the current state of the system. E.g. if a node is heavily loaded, it picks
up a task randomly and transfers it to a random node. These algorithms are simpler to
implement but performance may not be good.
 Dynamic: Use the current state information for load balancing. There is an overhead
involved in collecting state information periodically; they perform better than static
algorithms.
 Deterministic: Algorithms in this class use the processor and process characteristics
to allocate processes to nodes.
 Probabilistic: Algorithms in this class use information regarding static attributes of
the system such as number of nodes, processing capability, etc.
 Centralized: System state information is collected by a single node. This node makes
all scheduling decisions.
 Distributed: Most desired approach. Each node is equally responsible for making
scheduling decisions based on the local state and the state information received from
other sites.
 Cooperative: A distributed dynamic scheduling algorithm. In these algorithms, the
distributed entities cooperate with each other to make scheduling decisions.
Therefore, they are more complex and involve larger overhead than non-cooperative
ones. But the stability of a cooperative algorithm is better than of a non-cooperative
one.
 Non-Cooperative: A distributed dynamic scheduling algorithm. In these algorithms,
individual entities act as autonomous entities and make scheduling decisions
independently of the action of other entities.
Virtualization Types
 Virtualization is a technique of how to separate a service from the underlying physical
delivery of that service.
 It is the process of creating a virtual version of something like computer hardware.
 With the help of Virtualization, multiple operating systems and applications can run
on same machine and its same hardware at the same time, increasing the utilization
and flexibility of hardware.
 In other words, one of the main cost effective, hardware reducing, and energy saving
techniques used by cloud providers is virtualization.
 Virtualization allows to share a single physical instance of a resource or an
application among multiple customers and organizations at one time. It does this by
assigning a logical name to a physical storage and providing a pointer to that physical
resource on demand.
 The term virtualization is often synonymous with hardware virtualization, which
plays a fundamental role in efficiently delivering Infrastructure-as-a-Service (IaaS)
solutions for cloud computing.
 Moreover, virtualization technologies provide a virtual environment for not only
executing applications but also for storage, memory, and networking.
 The machine on which the virtual machine is going to be built is known as Host
Machine and that virtual machine is referred as a Guest Machine.
BENEFITS OF VIRTUALIZATION
1. More flexible and efficient allocation of resources.
2. Enhance development productivity.
3. It lowers the cost of IT infrastructure.
4. Remote access and rapid scalability.
5. High availability and disaster recovery.
6. Pay peruse of the IT infrastructure on demand.
7. Enables running multiple operating systems.
Types of Virtualization
1. Application Virtualization: Application virtualization helps a user to have remote
access of an application from a server. The server stores all personal information and
other characteristics of the application but can still run on a local workstation through
the internet. Example of this would be a user who needs to run two different versions
of the same software. Technologies that use application virtualization are hosted
applications and packaged applications.
2. Network Virtualization: The ability to run multiple virtual networks with each has a
separate control and data plan. It co-exists together on top of one physical network. It
can be managed by individual parties that potentially confidential to each other.
Network virtualization provides a facility to create and provision virtual networks—
logical switches, routers, firewalls, load balancer, Virtual Private Network (VPN), and
workload security within days or even in weeks.
3. Desktop Virtualization: Desktop virtualization allows the users’ OS to be remotely
stored on a server in the data centre. It allows the user to access their desktop
virtually, from any location by a different machine. Users who want specific operating
systems other than Windows Server will need to have a virtual desktop. Main benefits
of desktop virtualization are user mobility, portability, easy management of software
installation, updates, and patches.
4. Storage Virtualization: Storage virtualization is an array of servers that are managed
by a virtual storage system. The servers aren’t aware of exactly where their data is
stored, and instead function more like worker bees in a hive. It makes managing
storage from multiple sources to be managed and utilized as a single repository.
storage virtualization software maintains smooth operations, consistent performance
and a continuous suite of advanced functions despite changes, break down and
differences in the underlying equipment.
5. Server Virtualization: This is a kind of virtualization in which masking of server
resources takes place. Here, the central-server (physical server) is divided into
multiple different virtual servers by changing the identity number, processors. So,
each system can operate its own operating systems in isolate manner. Where each
sub-server knows the identity of the central server. It causes an increase in the
performance and reduces the operating cost by the deployment of main server
resources into a sub-server resource. It’s beneficial in virtual migration, reduce energy
consumption, reduce infrastructural cost, etc.
6. Data virtualization: This is the kind of virtualization in which the data is collected
from various sources and managed that at a single place without knowing more about
the technical information like how data is collected, stored & formatted then arranged
that data logically so that its virtual view can be accessed by its interested people and
stakeholders, and users through the various cloud services remotely. Many big giant
companies are providing their services like Oracle, IBM, At scale, Cdata, etc.
It can be used to performing various kind of tasks such as:
 Data-integration
 Business-integration
 Service-oriented architecture data-services
 Searching organizational data
Chapter 5 (Part 2) Question Bank Answers
1. Five Classes of Failure
A system is said to “fail” when it cannot meet its promises. A failure is brought about by the
existence of “errors” in the system. The cause of an error is called a “fault”. The five classes
of failures are as follows:
i. A crash failure occurs when a server prematurely halts, but was working correctly
until it stopped. A typical example of a crash failure is an operating system that comes
to a grinding halt, and for which there is only one solution: reboot it.
ii. An omission failure occurs when a server fails to respond to incoming requests. A
receive omission failure will generally not affect the current state of the server, as the
server is unaware of any message sent to it. Likewise, a send omission failure
happens when the server has done its work, but somehow fails in sending a response.
Such a failure may happen, for example, when a send buffer overflows while the
server was not prepared for such a situation.
iii. Timing failures occur when the response lies outside a specified real-time interval.
More common, however, is that a server responds too late, in which case a
performance failure is said to occur.
iv. A serious type of failure is a response failure, by which the server's response is
simply incorrect. Two kinds of response failures may happen. In the case of a value
failure, a server simply provides the wrong reply to a request. For example, a search
engine that systematically returns Web pages not related to any of the search terms
used. has failed. The other type of response failure is known as a state transition
failure. This kind of failure happens when the server reacts unexpectedly to an
incoming request. For example, if a server receives a message it cannot recognize, a
state transition failure happens if no measures have been taken to handle such
messages. In particular, a faulty server may incorrectly take default actions it should
never have initiated.
v. Arbitrary failures, also known as Byzantine failures occurs when a server may
produce arbitrary responses at arbitrary times.
2. Reliable group communication
 Reliable multicast services guarantee that all messages are delivered to all members of
a process group.
 Sounds simple, but is surprisingly tricky (as multicasting services tend to be
inherently unreliable).
 For a small group, multiple, reliable point-to-point channels will do the job, however,
such a solution scales poorly as the group membership grows. Also, what happens if
a process joins the group during communication? The worse is what happens if the
sender of the multiple, reliable point-to-point channels crashes half way through
sending the messages?
a. Basic Reliable-Multicasting Schemes

 This is a simple solution to reliable multicasting when all receivers are known and are
assumed not to fail.
 The sending process assigns a sequence number to each message it multicasts.
 We assume that messages are received in the order they are sent. In this way, it is easy
for a receiver to detect it is missing a message.
 Each multicast message is stored locally in a history buffer at the sender.
 Assuming the receivers are known to the sender, the sender simply keeps the message
in its history buffer until each receiver has returned an acknowledgment.
 If a receiver detects it is missing a message, it may return a negative acknowledgment,
requesting the sender for a retransmission.
 Alternatively, the sender may automatically retransmit the message when it has not
received all acknowledgments within a certain time.
b. Scalable Reliable Multicasting (SRM)
 Receivers never acknowledge successful delivery.

 Only missing messages are reported.
 2 examples of SRM protocol are as follows:
i. Nonhierarchical Feedback Control
 NACKs are multicast to all group members.

 This allows other members to suppress their feedback, if necessary.
 To avoid “retransmission clashes,” each member is required to wait a random
delay prior to NACKing.
 Feedback Suppression is reducing the number of feedback messages to the
sender (as implemented in the Scalable Reliable Multicasting Protocol).
 Successful delivery is never acknowledged, only missing messages are
reported (NACK), which are multicast to all group members. If another
process is about to NACK, this feedback is suppressed as a result of the first
multicast NACK. In this way, only a single NACK is delivered to the sender.
ii. Hierarchical Feedback Control
 Hierarchical reliable multicasting is another solution, the main characteristic being

that it supports the creation of very large groups.
 Sub-groups within the entire group are created, with each local coordinator
forwarding messages to its children.
 A local coordinator handles retransmission requests locally, using any appropriate
multicasting method for small groups.
3. Check pointing for recovery
 In a fault-tolerant distributed system, backward error recovery requires that the system
regularly saves its state onto stable storage.
 To recover after a process or system failure requires that we construct a consistent
global state (also called distributed snapshot).
 If a process has recorded the receipt of a message, then there should be another
process that has recorded the sending of that message.
 It is best to recover from the most recent distributed snapshot, also referred to as a
recovery line; it corresponds to the most recent collection of checkpoints.
Independent Checkpointing
 Discovering a recovery line requires that each process should roll back to its most
recently saved state; if the local states jointly do not form a distributed snapshot,
further rolling back is necessary.
 This process of cascaded rollback may lead to what is called the domino effect, i.e., it
may not be possible to find a recovery line except the initial states of processes
 This happens if processes checkpoint independently.
 When process P2 crashes, we need to restore its state to the most recently saved
checkpoint. As a consequence, process P1 will also need to be rolled back.
 Unfortunately, the two most recently saved local states do not form a consistent global
state: the state saved by P2 indicates the receipt of a message m, but no other process
can be identified as its sender.
 Consequently, P2 needs to be rolled back to an earlier state. However, the next state to
which P2 is rolled back also cannot be used as part of a distributed snapshot. In this
case, P1 will have recorded the receipt of message m`, but there is no recorded event
of this message being sent.
 It is therefore necessary to also roll P1 back to a previous state. In this example, it
turns out that the recovery line is actually the initial state of the system
Coordinated Checkpointing
 All processes synchronize to jointly write their state to local stable storage.
 The saved state is automatically globally consistent, so that cascaded rollback leading
to the domino effect is avoided.
 There are two ways to coordinate checkpointing:
1. A two-phase blocking protocol can be used (centralized)
 A coordinator multicasts a CHECKPOINT_REQUEST message to all
processes.
 In receipt of such a message, each process takes a local snapshot and
acknowledges to the coordinator.
 After receiving acknowledgement from all processes, the coordinator then
multicasts a CHECKPOINT_DONE message to allow (blocked) processes to
continue.
2. Incremental Snapshot
 A coordinator multicasts a CHECKPOINT_REQUEST message to only those
processes to whom it had sent a message since its last checkpoint.
 If process P receives such a request, it forwards it to all other processes to
which P itself had sent a message since last checkpoint and so on.
Chapter 6
File Accessing Models
The specific client's request for accessing a particular file is serviced on the basis of the file accessing
model used by the distributed file system.
The file accessing model basically depends on
1. the method used for accessing remote files and
2. the unit of data access
1. Accessing remote files: A distributed file system may use one of the following models to service a
client’s file access request when the accessed file is remote:
(a) Remote service model
Processing of a client’s request is performed at the server’s node. Thus, the client’s request for
file access is delivered across the network as a message to the server, the server machine
performs the access request, and the result is sent to the client. Need to minimize the number
of messages sent and the overhead per message.
(b) Data-caching model
This model attempts to reduce the network traffic of the previous model by caching the data
obtained from the server node. This takes advantage of the locality feature of the found in file
accesses. A replacement policy such as LRU is used to keep the cache size bounded. While
this model reduces network traffic it has to deal with the cache coherency problem during
writes, because the local cached copy of the data needs to be updated, the original file at the
server node needs to be updated and copies in any other caches need to be updated.
Advantage of Data-caching model over the Remote service model:
The data-caching model offers the possibility of increased performance and greater system scalability
because it reduces network traffic, contention for the network, and contention for the file servers.
Hence almost all distributed file systems implement some form of caching.
Example: NFS uses the remote service model but adds caching for better performance.
2. Unit of Data Transfer: In file systems that use the data-caching model, an important design issue
is to decide the unit of data transfer. This refers to the fraction of a file that is transferred to and from
clients as a result of single read or write operation.
(a) File-level transfer model: In file-level transfer model, the complete file is moved while a
particular operation necessitates the file data to be transmitted all the way through the
distributed computing network amongst client and server. This model has better scalability
and is efficient. This model requires sufficient storage space on the client machine. This
approach fails for very large files, especially when the client runs on a diskless workstation. If
only a small fraction of a file is needed, moving the entire file is wasteful.
(b) Block-level transfer model: In block-level transfer model, file data transfers through the
network amongst client and a server is accomplished in units of file blocks. In short, the unit
of data transfer in block-level transfer model is file blocks. The block-level transfer model
might be used in distributed computing environment comprising several diskless
workstations. When an entire file is to be accessed, multiple server requests are needed,
resulting in more network traffic and more network protocol overhead. NFS uses block-level
transfer model.
(c) Byte-level transfer model: In byte-level transfer model, file data transfers the network
amongst client and a server is accomplished in units of bytes. In short, the unit of data transfer
in byte-level transfer model is bytes. The byte-level transfer model offers more flexibility in
comparison to the other file transfer models since, it allows retrieval and storage of an
arbitrary sequential subrange of a file. The major disadvantage of byte-level transfer model is
the trouble in cache management because of the variable-length data for different access
requests.
(d) Record-level transfer model: The record-level file transfer model might be used in the file
models where the file contents are structured in the form of records. In record-level transfer
model, file data transfers through the network amongst client and a server is accomplished in
units of records. The unit of data transfer in record-level transfer model is record.
File-Caching Schemes
Every distributed file system uses some form of caching. The reasons are:
1. Better performance since repeated accesses to the same information is handled additional network
accesses and disk transfers. This is due to locality in file access patterns.
2. It contributes to the scalability and reliability of the distributed file system since data can be
remotely cached on the client node.
Key decisions to be made in file-caching scheme for distributed systems:
1. Cache location
2. Modification Propagation
3. Cache Validation
1. Cache Location: This refers to the place where the cached data is stored. Assuming that the
original location of a file is on its server’s disk, there are three possible cache locations in a distributed
file system:
(a) Server’s Main Memory: A cache located in the server’s main memory eliminates the disk
access cost on a cache hit which increases performance compared to no caching.
Advantages:
i. Easy to implement
ii. Totally transparent to clients
iii. Easy to keep the original file and the cached data consistent.
(b) Client’s Disk: In this case a cache hit costs one disk access. This is somewhat slower than
having the cache in server’s main memory. Having the cache in server’s main memory is also
simpler.
Advantages:
i. Provides reliability against crashes since modification to cached data is lost in a crash
if the cache is kept in main memory.
ii. Large storage capacity.
iii. Contributes to scalability and reliability because on a cache hit the access request can
be serviced locally without the need to contact the server.
(c) Client’s Main Memory: Eliminates both network access cost and disk access cost. This
technique is not preferred to a client’s disk cache when large cache size and increased
reliability of cached data are desired.
Advantages:
i. Maximum performance gain.
ii. Permits workstations to be diskless.
iii. Contributes to reliability and scalability.
2. Modification Propagation: When the cache is located on client’s nodes, a file’s data may
simultaneously be cached on multiple nodes. It is possible for caches to become inconsistent when the
file data is changed by one of the clients and the corresponding data cached at other nodes are not
changed or discarded.
There are two design issues involved:
1. When to propagate modifications made to a cached data to the corresponding file server.
2. How to verify the validity of cached data.
The modification propagation scheme used has a critical effect on the system’s performance and
reliability. Techniques used include:
(a) Write-through scheme: When a cache entry is modified, the new value is immediately sent
to the server for updating the master copy of the file.
Advantage:
 High degree of reliability and suitability for UNIX-like semantics.
 The risk of updated data getting lost in the event of a client crash is low.
Disadvantage:
 This scheme is only suitable where the ratio of read-to-write accesses is fairly large. It
does not reduce network traffic for writes.
 This is due to the fact that every write access has to wait until the data is written to the
master copy of the server. Hence the advantages of data caching are only read accesses
because the server is involved for all write accesses.
(b) Delayed-write scheme: To reduce network traffic for writes the delayed-write scheme is
used. In this case, the new data value is only written to the cache and all updated cache entries
are sent to the server at a later time.
There are three commonly used delayed-write approaches:
i. Write on ejection from cache: Modified data in cache is sent to server only when
the cache-replacement policy has decided to eject it from client’s cache. This can
result in good performance but there can be a reliability problem since some server
data may be outdated for a long time.
ii. Periodic write: The cache is scanned periodically and any cached data that has been
modified since the last scan is sent to the server.
iii. Write on close: Modification to cached data is sent to the server when the client
closes the file. This does not help much in reducing network traffic for those files that
are open for very short periods or are rarely modified.
Advantages of delayed-write scheme:

 Write accesses complete more quickly because the new value is written only client cache.
This results in a performance gain.
 Modified data may be deleted before it is time to send to send them to the server (e.g.
temporary data). Since modifications need not be propagated to the server this results in a
major performance gain.
 Gathering of all file updates and sending them together to the server is more efficient than
sending each update separately.
Disadvantage of delayed-write scheme:

 Reliability can be a problem since modifications not yet sent to the server from a client’s
cache will be lost if the client crashes.
(c) Cache Validation schemes: The modification propagation policy only specifies when the
master copy of a file on the server node is updated upon modification of a cache entry. It does
not tell anything about when the file data residing in the cache of other nodes is updated.
A file data may simultaneously reside in the cache of multiple nodes. A client ’s cache entry
becomes stale as soon as some other client modifies the data corresponding to the cache entry
in the master copy of the file on the server.
It becomes necessary to verify if the data cached at a client node is consistent with the master
copy. If not, the cached data must be invalidated and the updated version of the data must be
fetched again from the server.
There are two approaches to verify the validity of cached data: the client-initiated approach
and the server-initiated approach.
1. Client-initiated approach: The client contacts the server and checks whether its
locally cached data is consistent with the master copy. Two approaches may be used:
i. Checking before every access: This defeats the purpose of caching because
the server needs to be contacted on every access.
ii. Periodic checking: A check is initiated every fixed interval of time.
Disadvantage of client-initiated approach: If frequency of the validity check is
high, the cache validation approach generates a large amount of network traffic and
consumes precious server CPU cycles.
2. Server-initiated approach: A client informs the file server when opening a file,
indicating whether a file is being opened for reading, writing, or both. The file server
keeps a record of which client has which file open and in what mode. So server
monitors file usage modes being used by different clients and reacts whenever it
detects a potential for inconsistency. E.g. if a file is open for reading, other clients
may be allowed to open it for reading, but opening it for writing cannot be allowed.
So also, a new client cannot open a file in any mode if the file is open for writing.
When a client closes a file, it sends intimation to the server along with any
modifications made to the file. Then the server updates its record of which client has
which file open in which mode.
When a new client makes a request to open an already open file and if the server finds
that the new open mode conflicts with the already open mode, the server can deny the
request, queue the request, or disable caching by asking all clients having the file
open to remove that file from their caches.
Note: On the web, the cache is used in read-only mode so cache validation is not an
issue.
Disadvantage of server-initiated approach: It requires that file servers be stateful.
Stateful file servers have a distinct disadvantage over stateless file servers in the event
of a failure.

TT2 QB Solutions

Uploaded by

Copyright:

Available Formats

You might also like

TT2 QB Solutions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TT2 QB Solutions

Uploaded by

Copyright:

Available Formats

Chapter 3

Process 4 sends election messages to both Process 5 and Process 6.

Process 5 sends out only one election message to Process 6.

When Process 6 does not respond.

Lamport was the first to give a distributed mutual exclusion algorithm as an

Requesting the critical section.

Executing the critical section.

Releasing the critical section.

The Ricart-Agrawala algorithm is an optimization of Lamport's algorithm that

Requesting the critical section.

Executing the critical section

Releasing the critical section

Requesting the critical section.

Executing the critical section.

Releasing the critical section.

Requesting the critical section

Executing the critical section.

Requesting the critical section.

Executing the critical section.

Task assignment approach

1. Five Classes of Failure

2. Reliable group communication

a. Basic Reliable-Multicasting Schemes

b. Scalable Reliable Multicasting (SRM)

 Receivers never acknowledge successful delivery.

i. Nonhierarchical Feedback Control

 NACKs are multicast to all group members.

 Hierarchical reliable multicasting is another solution, the main characteristic being

3. Check pointing for recovery

Advantages of delayed-write scheme:

Disadvantage of delayed-write scheme:

You might also like