Professional Documents
Culture Documents
From, Jan. 18-20, 1995, Durham, NC
From, Jan. 18-20, 1995, Durham, NC
incurred in maintaining consistency. However, little of votes of r and w, respectively) are then chosen such
work has been done to evaluate the time it takes to that only one write operation can be performed at any
serve a request, accounting for server and network fail- time, and no read operations can be performed while a
ures, or to determine the eect of workload on these write operation is being performed. This behavior can
measures. The eect of workload can be signicant, be achieved by requiring that r + w be greater than the
since failures of system components are not impor- total number of votes assigned to the nodes and w be
tant unless they are needed to deliver a service, and greater than half of the total number of votes assigned
requests can force updates on data that would oth- to the nodes. Several variants of the basic voting al-
erwise be outdated. In this paper, with the help of gorithm have been introduced to improve availability
stochastic activity networks, we determine the avail- and performance. These variants are generally clas-
ability and mean time to respond to write requests sied as either static or dynamic. Algorithms which
as a function of the number of replicated copies and assign a xed number of votes to each node are called
workload oered to the system. The results illustrate \static" voting algorithms (for example, see [2, 3, 4]).
that it is indeed possible to determine such measures Dynamic algorithms assign a dynamic number of votes
analytically and that workload, as well as the number to each host, depending on the state of the system.
of copies, is an important determinant of availability They increase the availability of data by reassigning
and response time. votes, after a component failure among the remaining
functioning hosts (e.g., [5, 6, 7]).
Key words: Replicated Data Systems, Voting Algo-
rithms, Availability, Response Time, Stochastic Ac- Most of the studies related to the evaluation of vot-
tivity Networks, Stochastic Petri Nets. ing algorithms have focused on determining the avail-
ability or reliability of particular algorithms, or the
1 Introduction \cost" associated with implementing the voting al-
Dependable access to data is important for many gorithm. While these studies provided useful infor-
applications. This dependability is most often mation concerning the subject protocols, they were
achieved by replication of data on multiple nodes of limited in several respects. First, by basing the cri-
a system. In this way, the failure of some subset of teria for proper operation (i.e., an available system)
the nodes can be tolerated, maintaining the availabil- on the structural conguration of the system, they do
ity of data to a user of the system. However, increased not provide a user-oriented view of availability which
bases the proper/improper service criteria on the abil- haves as either a client or a server. Each server carries
ity of a system to deliver proper service when it is re- a replicated data object. Clients generate requests for
quested. The impact of workload on fault-tolerant sys- the replicated data, while the servers service the re-
tem dependability is evident from earlier work (e.g., quests.
see [8, 9]). Second, most studies have been limited Furthermore, we assume that the number of repli-
to studying the impact of a particular protocol and cated data objects is equal to the number of servers
network on the availability of the data, without si- and each replicated data object resides on a dierent
multaneously considering the performance delivered server. A data object becomes unavailable if either the
by the system. Failures of system components can sig- server it is on or the network fails. For simplicity, we
nicantly aect the performance of a replicated data further assume that each replicated data object con-
system, as perceived by a user. In many applications sists of a single le copy and all the data objects carry
performance measures such as the mean response time the copy of the same le. Since each server carries a
can be useful in determining the suitability of partic- replicated copy of the le, it is sucient to monitor a
ular voting algorithms. This fact has been recognized server's status to keep track of the status of the le
by several researchers (e.g., see [10, 11, 12]), which copy.
underscores the importance of determining the per- Servers and the network fail independently of each
formance, as well as availability of voting algorithms. other and can fail at any time other than when the
In this paper, we use stochastic activity networks replicated le system is in reconstruction. Since re-
(SANs), a variant of stochastic Petri nets, to eval- construction time is very short relative to the failure
uate the performance and availability of particular rates of the network and servers, it is reasonable to as-
static and dynamic voting algorithms, taking into ac- sume that failures do not occur during reconstruction.
count the eects of workload and component failures. We assume that failure and repair times are exponen-
Stochastic Petri nets were rst applied to the evalua- tially distributed with xed rates. Furthermore, we
tion of voting algorithms by Dugan and Ciardo [13], assume that a network failure prevents all communi-
who evaluated the eect of changing the number of cation on the LAN. In this case, the replicated le
copies on data availability. They did not, however, system becomes unavailable and cannot provide any
study the eect of workload on availability or deter- service (including completing service of a request in
mine the performance of the algorithm in the presence the system). As soon as the network is repaired, it
of faults. In particular we consider both server and becomes available and, if conditions are such that it
network failures. Our measure of performance is the can form a quorum, once again begins processing re-
mean response time of a write request to the system, quests.
keeping in mind that a request may have to wait until Repair starts as soon as a server or the network
the number of copies required to form a quorum be- fails. We take a conservative approach as suggested
come available. The results are signicant for two rea- in [13], and always consider a repaired server as out-
sons. First, they illustrate that one can indeed build dated. How outdated a repaired copy is depends upon
accurate and detailed models to analytically evaluate the number of write requests it has missed. However,
the performance and availability of replicated data in order to model the replicated le system with a rea-
systems. Second, they provide interesting informa- sonably small state space, we treat all outdated copies
tion regarding the performance and availability of the similarly and ignore the degree to which they are out-
specic voting algorithms and hardware conguration dated.
studied. Among other things, we show that the mean 2.2 Voting Algorithms
response time of the voting algorithms considered de- Many voting algorithms could have been used for
creases with an increase in the number of copies em- this study. We consider one static and one dynamic
ployed, and an increase in workload. This result is voting algorithm. Since they only dier in the require-
non-intuitive and linked to the particular voting algo- ment for quorum formation, we describe the static
rithms we considered. voting algorithm and highlight the dierences as nec-
2 Network, Voting, and Workload essary. Both algorithms are related to the work by
Jajodia and Mutchler [7].
Model The protocols make use of two types of version num-
2.1 Network Architecture Model bers associated with each copy of data, one \physical"
We consider a networked LAN environment where and the other \logical." The physical version num-
a single broadcast-capable link, i.e., \network", con- ber is an indication of the version of the data at the
nects the hosts together. A host on the network be- server. The logical version number is the version that
was written during the last quorum in which the copy f , the client enters the voting phase. In this phase,
participated. Note that the physical version number the client transmits messages to the servers carrying
may be less than the logical version number for a copy. copies of the le. Each server S locks the local copy
i
This indicates that the copy participated in a quorum f and sends the physical and logical version numbers,
i
formation, but (as is possible in the protocol) did not P N and LN , back to the client. (Note that in case
i i
have its data updated. Given these two version num- of dynamic voting each server also sends variable SC i
bers, there are three classes a copy can fall into. The back to the client). After receiving messages from the
rst class is where both the physical and logical ver- servers, the client rst determines the greatest logi-
sion numbers indicate that the copy participated in cal version number among the responses it received.
the most recent quorum formation. These copies are It then checks to see whether it received a number
called physically current copies. How this will be de- of responses with this logical version number greater
termined is discussed in the following. than half of the number of copies either in the sys-
The second class is where the logical version num- tem for static voting or that took part in the service
ber indicates the copy participated in the most recent of the last request for dynamic voting. It then checks
quorum. However, the physical version number is less to see if at least one of the copies with LN = V Ni
than the logical version number, indicating that the has P N = V N . If so, then the client has obtained
i
copy is out of date relative to the most recent update. a physically current copy, and can proceed to do the
We call these copies logically current copies. The third update and commit the write. If not, the client sends
and nal case is when the logical version number in- messages to the servers to release their locks and the
dicates that the copy did not participate in the last le system is forced into reconstruction.
quorum formation. In this case the data of the copy is
also outdated, since the physical version number must Commit If a quorum could be formed during the
be less than or equal to the logical version number. voting phase, the system enters the commit phase. In
We call these copies outdated copies. Both algorithms, this phase, the client increments the V N associated
static and dynamic, specify that a write request can with the le f and then it sends the updates to all
be executed if there exists at least one physically cur- servers that responded to the request for a quorum
rent copy and the number of physically or logically formation. Each receiving server S updates its local
i
current copies participating in quorum formation is copy f , makes the associated P N = LN = V N , and
i i i
greater than half of some number X . The selection of then releases the lock. Also, in case of dynamic vot-
this number X is what dierentiates the static and dy- ing, each server updates the variable SC equal to the
i
namic algorithms. The static algorithm species that number of copies that took part in the quorum forma-
X is equal to the total number of copies, while the tion before releasing the lock. Note that in the variant
dynamic algorithm requires that it must be equal to of the algorithm studied in this paper, we physically
the number of copies that took part in the service of update all servers that responded to the last quorum
the last request. formation request. We do this because we assume a
More precisely, each le f has an associated version broadcast-capable network where all servers can be
number, V N , which is initially set to 0 and then incre- updated in a cost equal to that of updating a single
mented with each write request on the le. Each copy server. When the cost to physically update multiple
f of the le f has a physical version number, P N , to
i i servers is greater than a single server (as it is on a
keep track of the physical status of the copy, and logi- non-broadcast-capable network), it may not be bene-
cal version number LN to keep track of the last quo-
i cial to update all servers. The algorithm itself only
rum that copy participated in. A copy f of the le f
i requires that we increment the logical version number
is said to be physically current if P N = V N . It is log-
i of more than half the servers, and physically update
ically current but not physically current, if LN = V N
i at least one server.
but P N < V N . Otherwise, it is outdated. Further-
i
more, each copy f has an associated variable SC to
i i Reconstruction is only invoked if the system is un-
keep track of the number of copies that took part in able to form a quorum in the voting phase. During
the service of the last request. (Note that variable SC i the reconstruction phase, all failed servers are rst re-
is only required in dynamic voting). Under normal paired and then all the servers, including the repaired
operation (when a quorum can be formed), the two ones, are physically updated. The write request which
phases voting and commit are executed in response to causes the invocation of reconstruction waits during
a write request: the reconstruction and after the reconstruction trig-
Voting After generating a write request for a le gers voting again.
Join
link_failure
lftime
initiation
link
link_repair
lrtime
network voting commit reconstruction transaction
reconst
for the data. The le system can only service one write f_servers
fstime
fsrepair