Faculty of Computer Science and Information Technology and Institute For Mathematical Research (INSPEM) Universiti Putra Malaysia, Malaysia

Journal of ICT, 16, No.
1 (June) 2017, pp: 43–62
A HIGH AVAILABILITY CLUSTER-BASED REPLICA

CONTROL PROTOCOL IN DATA GRID
Zulaile Mabni, 3Rohaya Latip, 2Hamidah Ibrahim & 1Azizol Abdullah

1
1&2
Faculty of Computer Science and Information Technology and
3
Institute for Mathematical Research (INSPEM)
Universiti Putra Malaysia, Malaysia
zulaile@tmsk.uitm.edu.my; rohayalt@upm.edu.my;
hamidah.ibrahim@upm.edu.my; azizol@upm.edu.my
ABSTRACT
Data replication is widely used to provide high data availability,

and increase the performance of the distributed systems. Many
replica control protocols have been proposed in distributed and
grid environments that achieved both high performance and
availability. However, the previously proposed protocols still
require a bigger number of replicas for read and write operations
which are not suitable for a large scale system such as data grid.
In this paper, a new replica control protocol called Clustering-
based Hybrid (CBH) has been proposed for managing the data
in grid environments. We analyzed the communication cost and
data availability for the operations and compared CBH protocol
with recently proposed replica control protocols called Dynamic
Hybrid (DH) protocol and Diagonal Replication in 2D Mesh
(DR2M) protocol. To evaluate CBH protocol, a simulation
model was implemented using Java. Our results show that for
the read operations, CBH protocol improves the performance of
communication cost and data availability compared to the DH
and DR2M protocols.
Key words: Data replication, grid computing, data availability, communication

cost.
INTRODUCTION
Grid computing is a distributed network computing system that enables large

scale resource sharing between machines distributed across many organizations
Received: 19 March 2016 Acepted: 7 December 2016
43
Journal of ICT, 16, No. 1 (June) 2017, pp: 43–62
and over a wide area network (Foster et al., 2001; Krauter et al., 2002). In
grid computing, data grid provides a scalable infrastructure to manage
huge amounts of data and support data intensive applications (Chervenak
et al., 2000; Abdullah et al., 2004, Yusof et al., 2012). Thus, managing the
large network and widely distributed data in the data grid is a challenging
problem. To address the challenge, various methods have been proposed in the
literature. Data replication is one of the widely used methods to improve data
availability and enhance the performance of the distributed database systems
(Lamehamedi et al., 2002; Mabni & Latip, 2011). However, some issues
arise in managing the replication of data. One of the issues is data availability
(Lamehamedi et al., 2003; Latip et al., 2014), because data is geographically
distributed over large networks. Another issue is communication cost, where
cost can become expensive if the number of read and write operations is high
(Choi & Youn, 2012; Latip et al., 2009). The communication cost is calculated
based on the number of replicas that need to be accessed. Thus, to obtain low
communication cost, the number of replicas need to be as small as possible
(Koch, 1993).
In a replicated distributed system, multiple copies or replicas of an object

are produced and stored at many sites. The operations that are allowed on
the replicated data are read and write operations. A read quorum (RQ) or
write quorum (WQ) is defined as a set of copies that is sufficient to execute
the read or write operation. In order to maintain a consistent state among the
replicas, these “multiple copies or replicas of an object must appear as a single
logical object to the transaction which is known as one-copy equivalence”
(Bernstein & Goodman, 1984). Thus, to ensure one-copy equivalence, the
quorum selected must satisfy the quorum intersection property. The property
states that “for any two operations o[x] and o’[x] on an object x, where at
least one of them is a write, the quorums must have a non-empty intersection”
(Gifford, 1979). Therefore, the basic property for any replica control protocol
is to guarantee non-empty intersection between read and write quorums in
order to maintain the consistency of the replicated data.
In the literature, many replica control protocols have been proposed in

distributed and grid environments which achieved both high performance and
availability. However, the previously proposed protocols still require a bigger
number of replicas for read and write operations which are not suitable for a
large scale system. In this paper, we propose a new replica control protocol
called Clustering-based Hybrid (CBH) protocol for the grid environment.
The proposed protocol employs a hybrid replication strategy by combining
the advantages of two common replica control protocols to improve the
44
performance of earlier protocols. The proposed protocol groups nodes into

clusters and organizes these clusters into a tree structure which enables the
protocol to minimize the number of replicas for read or write operations.
Thus, CBH provides low communication cost as well as high availability.
RELATED WORKS
Data replications have been an active research area in distributed and grid
environments. This section describes the previously proposed replica control
protocols where the number of replicas involved in executing the read and
write operations are different from each other.
Primary Copy Protocol
Primary Copy (PC) algorithm is a simple algorithm that selects one copy of
a data object as the primary copy ( Stonebraker, 1979; Ahamad et al., 1992;
Zhou & Holmes, 1999). In this protocol, each node knows which other nodes
it can communicate with by the up-list of nodes. By definition, the node that
has the lowest order in the up-list is selected as the primary copy of the data
object. The primary copy will maintain the consistency of the object. Any
other node is called a non-primary copy. A read operation is executed only at
the primary copy while the write operation updates the primary copy and then
propagates out to all other nodes that maintain the non-primary copies.
Read-One Write-All
Read-One Write-All (ROWA) protocol is another simple protocol for

managing replicated data (Bernstein & Goodman, 1984). In this protocol,
a read operation is required to access any single replica. On the other hand,
to perform write operation, all n replicas need to be accessed. Thus, data
consistency is guaranteed in the ROWA protocol.
Voting Protocol
Voting protocol (VT) was first proposed by Thomas (1979). This protocol
was later enhanced by Garcia-Molina & Barbara (1985) where, each replica is
assigned a certain number of votes v. Every transaction has to collect a read
quorum of r votes to read a replica, and a write quorum of w votes to write the
replica. A quorum must satisfy the following two constraints:
i) r+w>v
45
ii) w > v / 2
w>v The first constraint ensures that there is a non-empty intersection between
>v/2 every read quorum and every write quorum. The second condition ensures
that there is a non-empty intersection between two write quorums. The
onstraint ensures that there cost
communication is aofnon-empty
this protocolintersection
depends onbetween
the quorumevery
size.read
Thequorum
bigger and
quorum. The thesecond
size ofcondition
the read orensures that there
write quorum, theishigher
a non-empty intersectioncost.
the communication between
In two
ums. The communication cost operation
this protocol, read of this protocol
needs to depends on the quorum
access several size. The
replicas which makebigger
the the
read or writecommunication
quorum, the higher the communication
cost higher than the ROWA cost. In this Meanwhile,
protocol. protocol, read
for operation
the
cess several write
replicas which make the communication cost higher than the ROWA
operation, this protocol does not need to access all replicas such as in the protocol.
, for the write
ROWAoperation, thisthus
protocol, protocol doesitsnot
increasing need to access all replicas such as in the
fault-tolerance.
otocol, thus increasing its fault-tolerance.
Tree Quorum Protocol
um Protocol
In the Tree Quorum (TQ) protocol ( Agrawal & El Abbadi, 1990), replicas are
organized in a logical tree structure. Figure 1 shows the diagram of thirteen
Quorum (TQ) protocol ( Agrawal & El Abbadi, 1990), replicas are organized in a logical
copies in a tree quorum structure of height = 2 and degree of node D = 3,
re. Figure 1 shows the diagram of thirteen copies in a tree quorum structure of height = 2
where every copy represents a replica. In this protocol, a read quorum needs
of node D = 3, where every copy represents a replica. In this protocol, a read quorum
to access only the root replica. If the root is inaccessible, then a read quorum
cess only the root replica. If the root is inaccessible, then a read quorum needs to the access
needs to the access majority replicas of its children. Furthermore, for every
plicas of its children. Furthermore, for every inaccessible replica, the majority replicas of its
inaccessible replica, the majority replicas of its children are accessed, and so
e accessed, and so on and so forth. The examples of valid read quorums of Figure 1 are {1}
oot replica is on and so forth. The examples of valid read quorums of Figure 1 are {1} when
accessible, and {2,3} when the root replica is inaccessible.
the root replica is accessible, and {2,3} when the root replica is inaccessible.
, a write quorum consists of the root, and any majority replicas of the root’s children and
ty replicas ofMeanwhile, a write quorum consists of the root, and any majority replicas of
their children, and so forth until the leaves are reached. In Figure 1, the
the root’s children and any majority
f valid write quorums are {1,2,3,5,6,8,9}, replicas of their children, and so forth
and {1,3,4,9,10,11,12}.
until the leaves are reached. In Figure 1, the examples of valid write quorums
are {1,2,3,5,6,8,9}, and {1,3,4,9,10,11,12}.
Figure 1. A tree organization of 13 copies of data objects (Agrawal, 1990).

Figure 1. A tree organization of 13 copies of data objects (Agrawal, 1990).
46
Replication on 2D Mesh Protocol
Diagonal Replication on 2D Mesh Protocol
In Diagonal Replication on 2D Mesh structure (DR2M) nodes are organized

in a two-dimensional 2D mesh structure (Latip et al., 2008; Latip et al.,
2009). Figure 2 illustrates the network of 81 nodes which are divided into
four quorums. The nodes are logically grouped by 5 x 5 in each quorum. The
nodes in a quorum intersect with the nodes in other quorums to ensure that
each quorum can communicate with another quorum. The data is replicated to
only the middle node of the diagonal site in each quorum. The replicated data
is assigned with vote one. This protocol employs the voting technique where,
the write quorum qw can be formed by any majority of the replicas and the read
quorum qr by a half of the replicas. To ensure that consistency is maintained,
qw + qr must be greater than the total number of votes assigned to all replicated
data (Latip et al., 2008). The read and write quorum sizes of Figure 2 are 2
and 3 respectively.
Figure 2. DR2M with 81 nodes; each of the nodes has a data file a, b,…, and y respectively (Latip et al.,
Figure 2. DR2M with 81 nodes; each of the nodes has a data file a, b,…, and
y respectively (Latip et al., 2009).
ynamic Hybrid Protocol

Dynamic Hybrid Protocol
ynamic Hybrid (DH) protocol

Dynamic is a(DH)
Hybrid hybrid replicaiscontrol
protocol a hybrid protocol
replicathat has protocol
control been proposed recently
that has
Choi & Youn, 2012).
been proposed recently (Choi & Youn, 2012). In this protocol, the overall where
In this protocol, the overall topology combined the grid and tree structure
e tree height, number of descendants
topology combined theandgrid
gridand
depth
treecan be adjusted.
structure whereFigure 3 illustrates
the tree the network
height, number
DH protocol with 31 replicas in (3,3,2) topology, where the three arguments represent the height h,
umber of descendants s and grid depth g respectively. 47 In the tree structure of height h, the read
peration needs to access only the root replica. However, if the root is inaccessible, then the s
scendants of the root replica have to be accessed. The descendants of the root serve as the new root
of descendants and grid depth can be adjusted. Figure 3 illustrates the network
of DH protocol with 31 replicas in (3,3,2) topology, where the three arguments
represent the height h, number of descendants s and grid depth g respectively.
In the tree structure of height h, the read operation needs to access only the
root replica. However, if the root is inaccessible, then the s descendants of
the root replica have to be accessed. The descendants of the root serve as the
new root replica of the sub-tree. The process is repeated until level h − 1 is
reached. Furthermore, in the grid network of depth g, read operation reads s
replicas or goes to the next level if one of the replicas is inaccessible. Thus,
if the root replica is accessible, the read cost is only 1. The examples of valid
read quorums of Figure 3 are {R0}, {R1, R2, R3} and {R2, R3, R4, R5, R6}.
Meanwhile, the write operation reads the root replica; one replica of the root’s
descendants, one replica of these previously selected replicas’ descendants
and so forth until the leaves are reached. Furthermore, in the grid network of
depth g, write operation reads only one replica in each level down to the last
level. In Figure 3, the examples of valid write quorums are {R0, R1, R4, R13,
R22} and {R0, R2, R7, R16, R27}.
Figure
Figure 3. 3. A network
A network for Dynamic
for Dynamic Hybrid protocol
Hybrid protocol of 31 replicas
of 31 replicas in (3,3,2)intopology
(3,3,2) (Choi & Youn,
topology (Choi & Youn, 2012).
Review of Replica Control Protocols

Review of Replica Control Protocols
In this section, we have described the previously proposed replica control
protocols in distributed and grid environments. The Primary Copy protocol is
In this section, we have described the previously proposed replica control protocols in distributed and
easy to implement and has a very low read communication cost, however, if
grid environments. The Primary Copy protocol is easy to implement and has a very low read
communication cost, however, if the node that maintains
48 a primary copy is not accessible, then a write
operation cannot be executed. ROWA produces a very low read communication cost and very high
read availability like the Primary Copy protocol since only one replica is required to be accessed.
However, ROWA has very high communication cost for the write operation since all replicas must be
the node that maintains a primary copy is not accessible, then a write operation
cannot be executed. ROWA produces a very low read communication cost
and very high read availability like the Primary Copy protocol since only
one replica is required to be accessed. However, ROWA has very high
communication cost for the write operation since all replicas must be updated
simultaneously. Furthermore, a write operation cannot be performed in case
of any node failure. The concept of quorum used in Voting protocol has
improved the performance of the protocol compared to ROWA where only
the majority of nodes in the network need to be accessed in order to ensure
consistency. However, the communication cost for the write operation is still
expensive since a write quorum of w votes must be larger than the majority
votes (Mat Deris et al., 2004). The Tree Quorum protocol allows very low
read communication cost since read operation requires only one replica such
as in the ROWA protocol. However, as the level of the tree increases, the
number of replicas increases rapidly, thus increasing the communication cost.
To address the limitation of the Tree Quorum protocol, DR2M protocol has
minimized the number of replicas where the primary database is replicated
only at the middle of the diagonal site. The number of replicas that need
to be accessed for the read and write operations is small; thus it has low
read and write communication cost. However, as the network size grows
larger, the replicas will be further apart and decrease the performance of the
system. A recently proposed protocol named Dynamic Hybrid combined the
advantages of the Tree and the Grid protocols to allow low operation cost
and high availability. However, as the network size grew, a large number of
replicas still needed to be accessed to maintain data consistency and therefore,
degraded the performance of the system.
Most of the mentioned replica control protocols perform well in small size
systems where the number of replicas is small. However, in a larger system,
these replica control protocols require a larger number of replicas to be
accessed in order to maintain data consistency and degrade the performance
of the system (Abawajy & Mat Deris, 2014). Thus, these protocols are not
suitable for a large scale system such as data grid. Therefore, we propose a
new quorum-based replica control protocol called Clustering-based Hybrid
(CBH) protocol for managing replicated data in a large scale system. CBH
protocol minimizes the number of replicas for read or write operations as well
as maintains data consistency in a large scale system such as data grid.
CLUSTERING-BASED HYBRID PROTOCOL
In this section, we present the system model and the algorithm for the proposed
protocol called Clustering-Based Hybrid (CBH) protocol.
49
protocol minimizes the number of replicas for read or write operations as well as maintains data
consistency in a large scale systemJournal
such as of ICT,
data 16, No. 1 (June) 2017, pp: 43–62
grid.
System Model
CLUSTERING-BASED HYBRID PROTOCOL
The system consists of N sites that communicate with each other by
exchanging messages through a communication link. We assumed that sites
In this section, we present the system model and the algorithm for the proposed protocol called
fail independently
Clustering-Based and communication links do not fail to deliver messages.
Hybrid (CBH) protocol.
It is assumed that access requests to the physical replicas are performed by
executing transactions which are partially ordered read and write operations.
System Model
In the CBH protocol, the N sites in the network are logically grouped into
several nonintersecting groups. The N sites are divided into √N disjoint groups
The system consists of N sites that communicate with each other by exchanging messages through a
with each
communication link. Wegroup having
assumed approximately
that sites fail independently √Nandsites (Madhuram
communication & not
links do Kumar,
fail to 1994;
Mabni etItal.,
deliver messages. 2014; Latip
is assumed et al.,requests
that access 2014).to theEach group
physical is called
replicas a cluster.by These
are performed
executingclusters
transactions arewhich
logically organized
are partially as aandtree
ordered read height, hInand
writeofoperations. protocol, the s. We
descendants,
the CBH
N sites indefined
the networkthe nodes in the tree to be a sequence of clusters C0, C1N,…
are logically grouped into several nonintersecting groups. The sitesCiare
, Ci+1, …
divided into √ N disjoint groups with each group having approximately
Cn. We assumed that the nodes in each cluster are logically organized √ N sites (Madhuram & Kumar, into two
1994; Mabni et al., 2014;grid
dimensional Latipstructures.
et al., 2014).For Each group is called
example, if theaCBH
cluster.protocol
These clusters
consistsare of 81
logically organized as a tree of height, h and descendants, s. We defined the nodes in the tree to be a
nodes, it will be divided into 9 clusters with 9 nodes in each cluster. The nodes
sequence of clusters C0, C1,… Ci, Ci+1, … Cn. We assumed that the nodes in each cluster are logically
organizedininto
eachtwo cluster
dimensionalwill
gridbestructures.
logically Fororganized
example, if theinCBH
the protocol
form ofconsists
3 x 3ofgrid. In Figure
81 nodes,
it will be4,divided
an example of a ternary
into 9 clusters tree of
with 9 nodes height
in each = 2 The
cluster. withnodes
81 nodes
in each iscluster
presented.
will be Each
logicallycluster
organizeddesignates
in the form ofthe3 x middle
3 grid. InnodeFigureof4, the clusterof as
an example the cluster
a ternary head=which
tree of height 2 is
with 81 nodes is presented. Each cluster designates the middle node of
colored in black in Figure 4 and has the replica or primary copy of the datathe cluster as the cluster head
which is object.
colored in The black in Figure of
center 4 andthehascluster
the replica
is or primary copy
selected of the data
because it isobject. The centerpath to
the shortest
of the cluster
get a copy of the data from most of the directions in the cluster. of the
is selected because it is the shortest path to get a copy of the data from most
directions in the cluster.
Figure 4. System model of CBH in a ternary tree of height = 2 with 81 nodes.

Figure 4. System model of CBH in a ternary tree of height = 2 with 81
nodes.
Proposed Algorithm
50
Here, we describe the hybrid algorithm of the CBH protocol, where it combines the advantages of two
common replica control protocols, namely the Tree Quorum (TQ) protocol and the Primary Copy (PC)
Proposed Algorithm
Here, we describe the hybrid algorithm of the CBH protocol, where it

combines the advantages of two common replica control protocols, namely
the Tree Quorum (TQ) protocol and the Primary Copy (PC) protocol. The
hybrid algorithm logically groups the nodes into a tree structure. Figure 4
shows a system with 81 nodes for which we use the TQ protocol on top of
the PC protocol as the replication strategy. The system consists of 81 nodes
where nine clusters called C0, C1, …, C8 are “logical replicas” as shown in
Figure 4. The logical replica C0 serves as the root cluster, whereas the logical
replicas C1,…, C8 are its descendants. Each logical replica contains a cluster
of physical nodes with one middle node called “physical replica” which has
the replica or the primary copy of the data object. The physical replica in the
root cluster C0 is called root replica. Thus, for a system with N nodes, there
will be √N clusters, and √N replicas. We assume that every physical replica is
assigned exactly one vote. To illustrate the algorithm, the replication strategy
involves two strategies: “Local Replication”, where the PC protocol is used
for the replication strategy for managing the physical replica within a cluster
and “Global Replication”, where the TQ protocol is used as the replication
strategy for managing the logical replicas between clusters.
Read Operation
In the CBH protocol, for global replication, a read operation is based on the
TQ protocol, where the root replica C0 is accessed if it is accessible. However,
if the root replica C0 is inaccessible then a majority of physical replicas of
its children are added as members of this quorum. Furthermore, for every
inaccessible physical replica, a majority of physical replicas of its children
are added as members, and so forth. On the other hand, for local replication,
a read operation is based on the PC protocol, where a logical replica can be
read if the physical replica that it contains can be accessed. This means that
for reading a logical replica, the precondition is a read quorum of RQ = 1 if its
contained physical replica is accessible for read operation. Thus, in Figure 4,
by employing the TQ Protocol, the minimal read cost is 1 if the root replica is
accessible. However, if the root replica is inaccessible, then the read cost is 2
since the majority of the physical replicas of its children have to be accessed.
The examples of valid read quorums of Figure 4 are {C0} if the root replica is
available and {C1, C2} if the root replica is not available.
Write Operation
In the CBH protocol, for global replication, a write operation is based on

the TQ protocol, where the root replica C0 and any majority of the physical
51
replicas of the root’s children, and any majority of the physical replicas of
their children, and so forth are accessed until the leaves are reached. As for
local replication, a write operation is based on the PC protocol, where a logical
replica is accessible if a write operation can be performed on its physical
replica. This means that for writing a logical replica, the precondition is a
write quorum of WQ = 1 if its contained physical replica is accessible for
write operation. Therefore, in Figure 4, by employing the TQ protocol, we
obtain a write cost of 7. An example of valid write quorum of Figure 4 is {C0,
C1, C2, C4, C5, C7, C8}.
Correctness of CBH Algorithm
A replica control protocol is said to achieve one-copy equivalence if any

read quorum has a non-empty intersection with any write quorum. Here, we
demonstrate that the CBH protocol guarantees a non-empty intersection.
Theorem: The CBH protocol guarantees the intersection of read

and write quorums.
Local replication
Proof: In any cluster, there is only one physical replica or primary copy
that maintains the consistency of the object. Thus, the quorum intersection
property within a cluster is guaranteed.
Global replication
In (Agrawal & El Abbadi, 1990), the Tree Quorum protocol was proven to
satisfy the intersection property. Since the Tree Quorum protocol was used in
the global replication, the proof is as follows:
Proof: The proof is by induction on the height of the trees.
Basis: The theorem holds for a tree of height zero, since there is only one
physical replica in the tree.
Induction Hypothesis: Assume that the theorem holds for a tree

of height h.
Induction Step: Consider a tree of height h + 1. The read quorum

(RQ) and write quorum (WQ) for the CBH protocol are as
follows:
52
RQ = {Root Replica} or {Majority of physical replicas of sub

trees of height h}.
WQ = {Root Replica} and {Majority of physical replicas of sub

trees of height h}.
In the CBH protocol, the intersection property is guaranteed since any write
quorum must access the root replica whenever the root replica is accessible.
On the other hand, if the root replica is inaccessible, the read quorum must
access a majority of physical replicas for sub trees of height h. Therefore, it
is guaranteed to have at least one sub tree in common with any write quorum.
Since the sub trees are of height h, the induction hypothesis guarantees that
read and write quorums will have a non-empty intersection.
Thus, by induction, the CBH protocol guarantees a non-empty intersection

between the read and write quorums.
PERFORMANCE ANALYSIS AND COMPARISON
A simulation model was developed using Java to validate the CBH protocol.
In this section, we evaluate and compare the performances which are
communication costs and data availability of DR2M, DH and CBH protocols.
Communication Cost Analysis
The communication cost of an operation is directly proportional to the size

of read and write quorum required to execute the operation. Thus, the bigger
the number of replicas involved in the read or write operation, the higher
the communication cost. Therefore, for the cost analysis, we represent the
communication cost in terms of the number of replicas involved in the read or
write operation.
In DR2M (Latip et al., 2008; Latip et al., 2009), voting approach is used to
assign a certain number of votes to every copy of the replicated data objects.
The selected node in the diagonal sites is assigned vote one or zero. The
communication cost for read and write operation is directly proportional to the
size of the quorum. The DR2M communication cost for the read operation
CDR2M,R is:
53
CDR2M 𝑟𝑟
CDR2M ,R =,R⌊𝑟𝑟=⁄⌊ ⌋⁄2⌋

2
(1)
(1)(1)
whereas, the communication cost for the write operation CDR2M,W is:
(𝑟𝑟 + 1)
CDR2M
CDR2M
(𝑟𝑟 + 1)⁄ ⁄ ⌋
,W =,W⌊ = ⌊
⌋2 (2)(2)
(2)
2
where r is the number of replicas in the whole network for executing read or
CDHC,RDH=,R1= 1
write operations. (3)(3)
CDR2M 𝑟𝑟 ⌋𝑟𝑟
C,DR2M
R = ⌊,R ⁄
=2⌊ ⁄2⌋
t C,DHthat
,W that depends
on on
Dynamic
depends the value
theHybrid
value hofand
h and
Protocol
of g is:
g is: (1) (1)
DH W
The Cread CDH=,Wℎ=+ℎ1++1of 𝑔𝑔 DH protocol (Choi & Youn, 2012) needs to(4)
𝑔𝑔+the
DH,Woperation (4) access
(𝑟𝑟 +(𝑟𝑟1)+ 1) (2) (2) cost
only
CDR2M the
,
CDR2M
W =root
⌊ replica
,W = ⌊ ⁄ ⌋ if
2 ⁄2⌋ the root replica is available. The minimum read
CDH,RCis: 𝑟𝑟
DR2M,R = ⌊ ⁄2𝐷𝐷+1 ⌋
M =
M = ⌈ ⌉2 ⌉
𝐷𝐷+1⌈ (5)(5)
(1)
CDH,C 2
= 1,R = 1
R DH (3) (3)
𝑟𝑟
CDR2M,R = ⌊ ⁄2⌋
,CWDH
that depends (𝑟𝑟cost
,+
of1)hC⁄of (1) (6)
on the value of h and g is: (2)
,W that Con
and
depends the
theon
C
value
,Wwrite
C
=the
⌊ value
W =∑ iand
M gthat
⌋,hiW andis:gdepends
is: (6)
CBH,W =∑ M 2
DR2M CBH DH
CDH,C = ,ℎWn+ = 1ℎn+ 𝑔𝑔 (4) (4)

(4)
W DH
n
 n  + 1 + 𝑔𝑔
+1)⁄ 𝑖𝑖

(𝑟𝑟 (2)
A
ADR2M C ,
=R CDH
,R DR2M
DR2M = , =
W  ⌊ 
,R =1 (𝑝𝑝  𝑖𝑖(𝑝𝑝 ⌋
 (12 − 𝑝𝑝) )(1 − 𝑝𝑝)
𝑛𝑛−𝑖𝑖
𝑛𝑛−𝑖𝑖
) (7)(7)
(3)
i qr i 
i  qr  i 
Clustering-Based  Hybrid (CBH) Protocol
Mthe =nMvalue
𝐷𝐷+1 𝐷𝐷+1
n⌉  h⌉ and (5) (5)
2⌈ of
H,W that depends on ⌈n=n g is: 𝑛𝑛−𝑖𝑖


= 1 𝑖𝑖(𝑝𝑝 𝑖𝑖
AInDR2M
the=,WCBH=CDH,Rprotocol, 2 (1 𝑝𝑝) ) )
−communication (8) TQ
qiw i (𝑝𝑝 cost is estimated based (3)
𝑛𝑛−𝑖𝑖
ADR2M ,W (1 the − 𝑝𝑝) on
(8)the
CDH,Wi as = given
i
qw ℎ 
+1 +in𝑔𝑔Chung (1994) , where h denotes the height of the (4)tree, D is
protocol
DH,W that depends on the value of h and g is:
the degree of the M replicas in the tree, and M is the majority(6)
logical of D(6)where:
i
CCBHC,WCBH =∑ ,W M
=∑ i
𝑮𝑮(𝒈𝒈) 𝑮𝑮(𝒈𝒈−𝟏𝟏)
𝑮𝑮(𝒈𝒈) CDH𝒑𝒑,=W
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 =nM
℘ =𝒔𝒔ℎ(𝟏𝟏
𝒔𝒔 𝒑𝒑
+
=n
n ⌈
+ +−
𝐷𝐷+1(𝟏𝟏
1+ −𝒔𝒔𝑔𝑔𝒑𝒑𝒔𝒔 ).𝑮𝑮(𝒈𝒈−𝟏𝟏)
℘
⌉ 𝒑𝒑 ). ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 (9)(9)
(4)
(5) (5)
  n2 𝑖𝑖
ADR2M, R = 
  𝑛𝑛−𝑖𝑖 𝑛𝑛−𝑖𝑖
ADR2M , = (𝑝𝑝 (1 − 𝑝𝑝) )
 i  (𝑝𝑝 (1 − 𝑝𝑝) ) (7)
𝑖𝑖
R (7)
Therefore, i  qr i  i qrfor 𝐷𝐷+1 a tree of height h, the maximum quorum size is Mh and the
n Mn
 ,= n  ⌈n cost ⌉i for the read operation C , is in the range of(5)
communication C 
=∑ 2M (6)1 ≤ CCBH,R
,hW   (𝑝𝑝for for a𝑝𝑝)tree h is: CBH R
𝑖𝑖 𝑛𝑛−𝑖𝑖of height
ead ADR2M
availability
availability of ,≤W
ADR2M of=the
the MDH
CBH
DH
. =Meanwhile,
protocol
W
protocol  (1(𝑝𝑝 a −𝑖𝑖 (1
thetree − of 𝑝𝑝) )𝑛𝑛−𝑖𝑖
height
communication ) h is: cost for the write (8)
operation C(8) , is:
i  qw i iqw i 
  CBH W
n
 n  𝑖𝑖 i
,(𝒍𝒍)R℘=(𝒍𝒍) 
ADR2M CCBH
=
𝒑𝒑,W+=∑ (𝑝𝑝M
(𝟏𝟏
(1 −
−).𝒑𝒑(℘
𝑝𝑝)𝑛𝑛−𝑖𝑖
). (𝒍𝒍−𝟏𝟏)(𝒍𝒍−𝟏𝟏)) 𝒔𝒔
(℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
(6)
(7)(6)
(10)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 =i 𝒑𝒑qr +  i (𝟏𝟏
 − 𝒑𝒑 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 )𝒔𝒔 ) (10)
𝑮𝑮(𝒈𝒈) 𝑮𝑮(𝒈𝒈) 𝒔𝒔 𝑮𝑮(𝒈𝒈−𝟏𝟏)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑 =n+n𝒑𝒑(𝟏𝟏
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 + 𝒑𝒑−). ℘
𝒔𝒔 −(𝟏𝟏 𝒔𝒔
𝒑𝒑𝒔𝒔𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝑮𝑮(𝒈𝒈−𝟏𝟏)
). ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 (9) (9)
where nni= 0,…,h.
AA DR2M
DR2M ,W,R(𝟎𝟎) == 
 𝑮𝑮(𝒈𝒈)
=𝑮𝑮(𝒈𝒈)
qr℘ i
 (𝑝𝑝
i where
𝑖𝑖
(𝑝𝑝(1
where
𝑖𝑖 (1−−𝑝𝑝)
𝑙𝑙 ℎ=−ℎ1− 1
𝑛𝑛−𝑖𝑖
𝑝𝑝) ) )
𝑛𝑛−𝑖𝑖
(7)
(8)
(𝟎𝟎)
℘Comparison
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = i ℘
i
 q    
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 of Communication Costs
w 𝑙𝑙 =
n
 n  𝑖𝑖
ailability A
of DR2M
theofDH
d availability For
, =
W protocol
the theDH   for
protocol
read
𝒔𝒔i  qw  i
 (𝑝𝑝
afor
communication
(1a tree
tree −of 𝑝𝑝) 𝑛𝑛−𝑖𝑖
height ) h is: h is:
of height
costs, both the CBH and the DH protocols (8) have
𝑮𝑮(𝒈𝒈)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑 + (𝟏𝟏 − 𝒑𝒑 𝒔𝒔 ). ℘  k
s 𝑮𝑮(𝒈𝒈−𝟏𝟏)
k𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓  (9)
the same minimum s
( read  cost𝑝𝑝𝑘𝑘of(11.−𝑠𝑠−𝑘𝑘 This
𝑠𝑠−𝑘𝑘 is𝑮𝑮(𝒍𝒍−𝟏𝟏)
) . due to the fact that CBH and DH
 s𝑝𝑝𝑘𝑘 (1
𝑮𝑮(𝒍𝒍) 𝑮𝑮(𝒍𝒍−𝟏𝟏)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝=(𝑝𝑝 − 𝒑𝒑) 𝒑𝒑) ) . ℘ ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
𝑮𝑮(𝒍𝒍)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
(𝒍𝒍) need to consult only the root replica if the root replica is accessible. On the
1  (𝒍𝒍−𝟏𝟏)
)𝒔𝒔 )𝒔𝒔
𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
s
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑 +
(𝒍𝒍)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑(𝟏𝟏+−(𝟏𝟏 𝒑𝒑 ).
−k (℘𝒑𝒑1 
k(𝒍𝒍−𝟏𝟏)
 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
).𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
(℘ (10)(10) (11)(11)
1g -1
𝑮𝑮(𝒈𝒈) 𝑮𝑮(𝒈𝒈−𝟏𝟏)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝒔𝒔
= 𝒑𝒑 + (𝟏𝟏 − 𝒑𝒑 𝒔𝒔 ).
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 (9)
54
for a tree of k  h𝑘𝑘 is:
 kheight
s
s
s  s 
vailability of(𝟎𝟎)the DH 𝑮𝑮(𝒈𝒈)
protocol𝑮𝑮(𝟎𝟎) 𝑮𝑮(𝟎𝟎)
 𝑘𝑘 𝒑𝒑 (1 −𝑠𝑠−𝑘𝑘
𝒑𝒑)𝑠𝑠−𝑘𝑘
𝑮𝑮(𝒈𝒈) ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
𝑙𝑙== = −
℘ = ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
= ℘ where   1𝒑𝒑 (1 − 𝒑𝒑)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 (𝟎𝟎) ℎ𝑙𝑙 = 1
where𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 ℎ −
k 1  
k 1
other hand, for an example of 81 nodes, the DR2M protocol requires a higher
read communication cost, which is 2.
Table 1 illustrates the write communication costs of the CBH, DH and DR2M
protocols for an example system with a different total number of nodes, n =
81, 121, 225, and 289. Here, for a fair comparison, we assume that for the
DH and the CBH protocols, the number of descendants is 3. For the write
costs as illustrated in Table 1, it is apparent that DR2M has the lowest overall
write communication cost. This is because in DR2M, the write quorum size is
smaller than those of the other two protocols. The average write cost for CBH
is 8.0 and for DH is 9.5. Thus, CBH has reduced the average write cost by up
to 15.8% compared to DH.
Considering the read communication cost, in the best case, CBH and DH
are better than DR2M. As for the write communication cost, CBH provides
lower write costs than DH since the number of replicas required for the write
operation is smaller.
Table 1
Comparison for the Write Communication Costs of the Protocols
Number of nodes in the system

Protocols
N = 81 N = 121 N = 225 N = 289
DR2M 3 3 3 3
DH 6 7 11 14
CBH 7 7 9 9
Data Availability Analysis
In this section, we analyze the read and write availability of the protocols. The
availability of the protocol is defined as the probability of successfully forming
a read and write quorum in that protocol. The read and write availability is
determined by the probabilistic failure model where every replica is available
independently with a probability p. Every replica is assumed to have the same
availability p in estimating the availability of an operation.
The DR2M (Latip et al., 2008; Latip et al., 2009) read availability ADR2M,R is
represented as:
55
2 (1)
CDR2M,R = ⌊𝑟𝑟⁄2⌋
(𝑟𝑟 + 1)⁄ (2)
CDR2M,W = ⌊ 2⌋ (𝑟𝑟 + 1) (1)
(𝑟𝑟 +of1)ICT,
Journal ⁄2⌋ 16, No. 1 (June) ,W =pp:⌊ 43–62 ⁄2⌋
2017,
CDR2M (2)
CDR2M,W = ⌊ 𝐷𝐷+1 (5)
M =
(𝑟𝑟 + 1) 2 M ⌈ ⌉ i (2) (6)
CDR2M,W = ⌊ CCBH,W⁄=∑ 2⌋
(𝑟𝑟 + 1)⁄CDH ,R = 1 (2) (3)
C
DR2M,W = ⌊ n⌋ n (7)
C2DH ,R = 1 𝑖𝑖 i C , = 1 (3)

DH R
ADR2M,R =, C=CBH ,W =∑(𝑝𝑝M(1 − 𝑝𝑝)𝑛𝑛−𝑖𝑖 ) (6)
(7)
e cost CDH,W that dependsConDH Rthe1value
i  qr  i 
of h and g is: (3)
e cost CDH,W that depends and theonwrite the value cost Cof DHh ,Wand thatgdepends is: on the value of h and g is:
C ,R = 1 n (3)
t CDH,W that dependsDHon the value
CDH=,W = ℎ + 1 +
n

of h and g is: (4)(8)
AA
and writeDR2M DR2M ,,
CRDH,W = ℎ A
availability
W   +
i
qwr  gis:

 (𝑝𝑝 𝑔𝑔𝑖𝑖 (1 − 𝑝𝑝)𝑛𝑛−𝑖𝑖 )
1 + 𝑔𝑔is represented as:
DR2M,W CDH,W = ℎ + 1 + 𝑔𝑔 (4)(7)
W that depends on the value
CDH,W = ℎ + 1 + 𝑔𝑔of h i and
(4)
n
𝐷𝐷+1n  𝑖𝑖 (8)
CDH,W = ADR2Mℎ + ,1W+=M
𝑮𝑮(𝒈𝒈) Mi 𝒔𝒔=

𝑔𝑔 =  ⌈⌈𝐷𝐷+1
w  (𝟏𝟏
2
 ⌉(𝑝𝑝 (1 − 𝑝𝑝) )
i −⌉ 𝒑𝒑𝒔𝒔). ℘ 𝑮𝑮(𝒈𝒈−𝟏𝟏)
𝑛𝑛−𝑖𝑖
M= ⌈ ⌉
𝐷𝐷+1 (4) (5)(8)
(5)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑 q+ 2 (9)
M= ⌈ 2 ⌉
𝐷𝐷+1 2 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 (5)
In Equation 7 and Equation 8, n is the number of the column or row of the
)
𝒓𝒓 = 𝒑𝒑
𝒔𝒔
M = ⌈
𝐷𝐷+1
⌉CCBH,Win=∑ M i 2, the (5) (6)
grid. For example, 𝑮𝑮(𝒈𝒈) Figure i𝒑𝒑𝒔𝒔 ). ℘𝑮𝑮(𝒈𝒈−𝟏𝟏) value of n is 5. p is the i probability that
2 = 𝒑𝒑𝒔𝒔 +
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 CCBH,W =∑ M (𝟏𝟏 − C CBH W , =∑ M (6)(9)
a copy is available with i a value between 0 and 0.9. The qR and qW (6) are the
verall read availability CCBHDH ,Wn=∑ M for a tree of height h is:
numberofofthe quorums protocol
 nfor  the read and write operations, respectively.
)
= 𝒑𝒑 𝒔𝒔
A  , = n  n  (𝑝𝑝𝑖𝑖 (1 − 𝑝𝑝)𝑛𝑛−𝑖𝑖 )
i    n
 n 
  (𝑝𝑝𝑖𝑖 (1 −(6)
(7)
CCBH ,W =∑

DR2M R
 ,Rn =Mn
𝒓𝒓
ADR2M i  qr   i  (𝑝𝑝𝑖𝑖 (1 − 𝑝𝑝)𝑛𝑛−𝑖𝑖 ADR2M ) ,R = 𝑝𝑝)𝑛𝑛−𝑖𝑖 ) (7)
i  qr  i 
A Dynamic
verall read availability
DR2M , R 
= Hybrid
(𝒍𝒍)  i n
of℘the DH qr 
= 𝒑𝒑protocol
i 𝑖𝑖(𝟏𝟏
protocol
(𝑝𝑝
+n  −for 
qr  i n   (𝑝𝑝𝑖𝑖 (1 − 𝑝𝑝)𝑛𝑛−𝑖𝑖 )
(1 − 𝑝𝑝)
𝒑𝒑 ).a(℘
𝑛𝑛−𝑖𝑖(𝒍𝒍−𝟏𝟏) 𝒔𝒔
tree ) of )height h is: (7) (10)

 n i,𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
n 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
ADR2M W = 𝑖𝑖  n  n
 n  (8)
ADR2M,R Meanwhile, 
= A  , n (𝑝𝑝

DR2M W =
i  qr  i 
(1 −
in inthe
 qw 
DHi 𝑝𝑝)  𝑛𝑛−𝑖𝑖
(𝑝𝑝𝑖𝑖)(1 −(Choi
 protocol ADR2M
𝑝𝑝)𝑛𝑛−𝑖𝑖 & 
) Youn,
,W = 2012),   the (1 −(7)
(𝑝𝑝𝑖𝑖overall 𝑝𝑝)availability
𝑛𝑛−𝑖𝑖
) (8)
i
AisDR2M
obtained
n

,W =℘℘(𝟎𝟎) using
(𝒍𝒍)
 n i𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
==i ℘q𝒑𝒑the +
w𝑮𝑮(𝒈𝒈) i 𝑖𝑖
(𝟏𝟏
(𝑝𝑝availability
(1 − 𝑝𝑝)
− 𝒑𝒑 ).𝑙𝑙 (℘
where
qw  i  𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓of the grid structure is:
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝑛𝑛−𝑖𝑖
=ℎ
of ) the
(𝒍𝒍−𝟏𝟏)
− 1)
tree
𝒔𝒔 and  qw  structure.
igrid  The availability
(8) (10)
of the read operation
ADR2M,W =  𝑮𝑮(𝒈𝒈)
 (𝑝𝑝 (1 𝑖𝑖 𝑛𝑛−𝑖𝑖
− 𝑝𝑝) ) 𝑮𝑮(𝒈𝒈−𝟏𝟏) (8)
i  qw℘  i𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓

𝑮𝑮(𝒈𝒈)
= 𝒑𝒑𝒔𝒔 + (𝟏𝟏 − 𝒑𝒑𝒔𝒔 ). ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝒔𝒔 ). 𝑮𝑮(𝒈𝒈−𝟏𝟏) ℘ 𝑮𝑮(𝒈𝒈) 𝑮𝑮(𝒈𝒈−𝟏𝟏)
(9)
℘℘ (𝟎𝟎) = 𝒑𝒑𝒔𝒔 + 𝑮𝑮(𝒈𝒈)
= ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒔𝒔where (𝟏𝟏 − 𝒑𝒑 ℘ = 𝒑𝒑 𝒔𝒔
+ (𝟏𝟏 − 𝒑𝒑 𝒔𝒔 ).
℘ (9) (9)
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 s 𝑙𝑙 = ℎ−1
k𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝑮𝑮(𝒈𝒈)
= 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝒔𝒔 𝑮𝑮(𝒈𝒈−𝟏𝟏)   (9)
= 𝒑𝒑
= 𝒑𝒑𝒔𝒔
𝒔𝒔 ℘ 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝒔𝒔 with
𝒑𝒑 + (𝟏𝟏
𝑮𝑮(𝟎𝟎)
− 𝒑𝒑
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 ( 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝑮𝑮(𝒍𝒍)
𝒑𝒑𝒔𝒔
= 𝑮𝑮(𝒈𝒈−𝟏𝟏)
 ). ℘
 s  𝑝𝑝 (1 − 𝒑𝒑) ) . ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
𝑘𝑘 𝑠𝑠−𝑘𝑘 𝑮𝑮(𝒍𝒍−𝟏𝟏)
℘
𝑮𝑮(𝒈𝒈)
= 𝒑𝒑 + (𝟏𝟏 − 𝒑𝒑 ). ℘
𝒔𝒔
k  1   (9) (
ralllread
ere = g availability
-1 of the DH protocol for as tree  kof  ofheight h is:
Thus,ofthe
rall read availability theoverall
Thus, read=read
the℘protocol
DH overall
𝑮𝑮(𝒍𝒍)
availability
for
𝑝𝑝 ( a tree
 of

availability

𝑘𝑘the
height
of
(1the
 s  k  𝑘𝑘 𝒑𝒑) )
s 𝑝𝑝 −DHhDH protocol
is: protocol
𝑠𝑠−𝑘𝑘 for
foraatree
𝑮𝑮(𝒍𝒍−𝟏𝟏)
. ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 treeof heighthhis:
ofheight is:
ad availability of the DH protocol for℘a𝑮𝑮(𝟎𝟎)

treek=of
height
1   h is:

  𝒑𝒑 (1 − 𝒑𝒑)
1  s
𝑠𝑠−𝑘𝑘
(10) (10)
(
)𝒔𝒔 (𝒍𝒍)
(𝒍𝒍) (𝒍𝒍−𝟏𝟏)
ere l = g -1 ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑 + (𝟏𝟏 − 𝒑𝒑 ). (℘k𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
lability of the DH protocol (𝒍𝒍) for a tree of height h is:
(𝒍𝒍−𝟏𝟏) (𝒍𝒍−𝟏𝟏) 𝒔𝒔
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑 + (𝟏𝟏 − 𝒑𝒑 ). (℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 ) 𝒔𝒔 ℘ = 𝒑𝒑 + (𝟏𝟏 − 𝒑𝒑 ). (℘ ) (10)
(𝒍𝒍−𝟏𝟏) 𝒔𝒔  k 
s 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
with
s 
(𝒍𝒍)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = 𝒑𝒑 + s(𝟏𝟏 − 𝒑𝒑℘ ).𝑮𝑮(𝟎𝟎)
(℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = )   𝒑𝒑 (1 − 𝒑𝒑)
𝑘𝑘 𝑠𝑠−𝑘𝑘 (10)
k  𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
(𝒍𝒍) ℘
(𝒍𝒍)
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 =𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ℘
𝒑𝒑 + (𝟏𝟏
With
=(𝟎𝟎)
𝑝𝑝 (
𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 =   s 
℘
𝑮𝑮(𝒈𝒈)
where
(𝒍𝒍−𝟏𝟏)(℘
(𝒍𝒍−𝟏𝟏) 𝑘𝑘
(℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 )𝒔𝒔𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
(𝟎𝟎) − 𝒑𝒑 ).𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝑮𝑮(𝒈𝒈)
𝑙𝑙 = ) kℎ(1  
1−−1℘
(𝒍𝒍−𝟏𝟏) 𝑠𝑠−𝑘𝑘
(𝟎𝟎)
) )𝑮𝑮(𝒈𝒈) (10)
(𝟎𝟎) 𝑮𝑮(𝒈𝒈)
 
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 =k ℘1 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 where 𝑙𝑙 = ℎ − 1 ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 where 𝑙𝑙 = ℎ − 1
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 = ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓s where 𝑙𝑙 = ℎ − 1
On(𝒍𝒍)the other hand, the(℘ k  availability
  s  ) (1 −of the write ) )operation of the grid structure
(𝒍𝒍−𝟏𝟏) 𝑘𝑘 (𝒍𝒍−𝟏𝟏) 𝑠𝑠−𝑘𝑘
(𝟎𝟎) ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝑮𝑮(𝒈𝒈)= 𝑝𝑝 ( ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 is:= ℘𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 where𝑮𝑮(𝒍𝒍) 𝑙𝑙 = ℎ − 1 s
k 
 
℘k𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
1 = 𝑝𝑝 ( s   sk  𝑝𝑝𝑘𝑘 (1 − 𝒑𝒑)𝑠𝑠−𝑘𝑘𝑮𝑮(𝒍𝒍)
) . ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 s k 
𝑮𝑮(𝒍𝒍)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = s𝑝𝑝 ( k 1 k  s  𝑝𝑝𝑘𝑘 (1 − 𝒑𝒑)𝑠𝑠−𝑘𝑘 ) . ℘=𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝑝𝑝 (   s 
𝑮𝑮(𝒍𝒍−𝟏𝟏) 𝑮𝑮(𝒍𝒍−𝟏𝟏)
𝑝𝑝𝑘𝑘 (1 − 𝒑𝒑)𝑠𝑠−𝑘𝑘 ) . ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 (11
l = g -1 𝑮𝑮(𝒍𝒍)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 (
k
  s   
k 1 𝑝𝑝𝑘𝑘 (1 − 𝒑𝒑)𝑠𝑠−𝑘𝑘 ) . ℘
k 1   (11) (11
l = g -1
s
   
where l = 𝑘𝑘g -1 s 𝑠𝑠−𝑘𝑘
k  (11)
  s 
1
𝑮𝑮(𝒍𝒍)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 ( k 
𝑝𝑝 (1 − 𝒑𝒑) ) . ℘𝑘𝑘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝑠𝑠−𝑘𝑘
g -1
 
𝑮𝑮(𝟎𝟎)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 =   sk 
s 𝒑𝒑 (1 − 𝒑𝒑) s
k  (11)
 k   s 
k 1 𝑮𝑮(𝟎𝟎) 𝑮𝑮(𝟎𝟎)
where l = g -1 ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘s = k 1   s  𝒑𝒑𝑘𝑘 (1 − 𝒑𝒑)𝑠𝑠−𝑘𝑘 ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝒑𝒑𝑘𝑘 (1 − 𝒑𝒑)𝑠𝑠−𝑘𝑘
    
𝑮𝑮(𝟎𝟎)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑘𝑘
k 1 𝒑𝒑 (1 − 𝒑𝒑) 𝑠𝑠−𝑘𝑘 k 1  
k  s 
s
k 
  sk  s 
1
(𝒍𝒍) ℘ k   
𝑮𝑮(𝟎𝟎) s
= 𝒑𝒑𝑘𝑘 (1
(𝒍𝒍−𝟏𝟏) − 𝒑𝒑)𝑠𝑠−𝑘𝑘 (𝒍𝒍−𝟏𝟏) 𝑠𝑠−𝑘𝑘
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
( s (℘ ) 𝑘𝑘
(1 − ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ) 𝑠𝑠−𝑘𝑘s) k 
 k   s  (𝒍𝒍−𝟏𝟏) 𝑠𝑠−𝑘𝑘
(𝒍𝒍) k  1 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
(𝒍𝒍−𝟏𝟏) (𝒍𝒍) (𝒍𝒍−𝟏𝟏) (𝒍𝒍−𝟏𝟏)
  s 
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = s𝑝𝑝 ( k 1 (℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 )𝑘𝑘 (1 ℘− = 𝑝𝑝 )( )
𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 (℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 )𝑘𝑘 (1 − ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ) )
    
𝑠𝑠−𝑘𝑘
(𝒍𝒍)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 (  k 1 (℘
(𝒍𝒍−𝟏𝟏) 𝑘𝑘 (𝒍𝒍−𝟏𝟏)
𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ) (1 − ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ) ) k 1  
s
k  s 
k 
56
  s 
(𝒍𝒍) 1
(𝒍𝒍−𝟏𝟏) (𝒍𝒍−𝟏𝟏) 𝑠𝑠−𝑘𝑘
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 ( (℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 )𝑘𝑘 (1 − ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ) )
with
k 1  
𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 s  𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
k 1  
where l = g -1 Journal of ICT, 16, No. 1 (June) 2017, pp: 43–62
s
k 
1 
𝑮𝑮(𝟎𝟎)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 =   s  𝒑𝒑 (1 − 𝒑𝒑)
𝑘𝑘 𝑠𝑠−𝑘𝑘
Thus, the overall write availability of

k the  protocol for tree of height h is:
DH

s
 k  (𝒍𝒍−𝟏𝟏) (12)
(𝒍𝒍) (𝒍𝒍−𝟏𝟏) 𝑠𝑠−𝑘𝑘
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 (   s  (℘ 𝑘𝑘
𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ) (1 − ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 ) )
k 1  
with
with (12)
 k  𝑘𝑘
s
= 𝑝𝑝 (    s  𝒑𝒑 (1 − 𝒑𝒑) ). ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
(𝟎𝟎) 𝑠𝑠−𝑘𝑘 𝑮𝑮(𝒍𝒍)
k 1  
where l = h -2
where l = h -2
n 1
n
 (CBH)
APC, RHybrid
Clustering-Based =
i
 pi (1protocol
- p)n-i  (13)
i 1  
The overall availability of the CBH protocol is obtained using the combination
= 1 - (1 - p)n (12)
of the availability
s k
  𝑘𝑘 for the PC protocol and the TQ protocol. The availability
(𝟎𝟎)
℘for s

𝑝𝑝 (k 
the= read   𝒑𝒑 of
operation (1the− 𝒑𝒑) 𝑠𝑠−𝑘𝑘 𝑮𝑮(𝒍𝒍)
). ℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
PC protocol
(12)
is as given in Equation 13, where p is

(𝟎𝟎)
℘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑝𝑝 (  k1 𝒑𝒑𝑘𝑘s(1 − 𝒑𝒑)𝑠𝑠−𝑘𝑘 ).D ℘𝑮𝑮(𝒍𝒍) D 
the probability of data file accessing between 0.1 and 0.9 and i is the increment

h -2 k 1  s    (14)
of n. k i (12)
(12)
s
  s
k   𝑖𝑖
  s    s 
D−i
𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅ℎ+1 = 𝑨𝑨 ℘𝑷𝑷𝑷𝑷,𝑹𝑹 +
(𝟎𝟎)
= 𝑝𝑝(𝟏𝟏( −℘𝑨𝑨 (𝟎𝟎)
= 𝑝𝑝)
𝑷𝑷𝑷𝑷,𝑹𝑹
i  M
𝒑𝒑(𝑘𝑘 (1 − 𝒑𝒑) 𝑘𝑘𝐴𝐴
 𝒑𝒑𝑠𝑠−𝑘𝑘 ).− 𝒑𝒑)
𝑮𝑮(𝒍𝒍)(1 −
℘𝑠𝑠−𝑘𝑘
(1𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅ℎ ). ℘
𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅ℎ )
𝑮𝑮(𝒍𝒍)
𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
k 1 n 
n 1 k 1
(13)
with
ere l = h -2 where l = hA-2 R1 
PC,n  
=n    pi (1 - p)n-i (13)
APC, R = 
 i 1 pi i𝐴𝐴(1 - p)n-i = 𝑨𝑨 (13)
i 1 i  𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅0 n𝑷𝑷𝑷𝑷,𝑹𝑹
1
n
 n  =i   i  n-ip (1 - p)
i n-i
n 1 APC, R (13)
n
n n1
= 1R -=
APC, (1 - p)   p i 1 - p)
(1  (13)
= 1 - (1 - p) i1ni i
APC, W =    p (1 - p)n-i n
Thus, the overall availability for read operation of CBH protocol for a tree of (15)
i 1 D i 
D  = 1 - (1 - p)
height h + 1 is:D D   n n (14)
= 1=-1(1- i-(1p) - p)
𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅ℎ+1 = 𝑨𝑨𝑷𝑷𝑷𝑷,𝑹𝑹 + (𝟏𝟏 − 𝑨𝑨𝑷𝑷𝑷𝑷,𝑹𝑹  ) 
 i 
i  M  i  𝐴𝐴
M   𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅
D
ℎ 
𝑖𝑖  D 
(1 − 𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅ℎ )
 i 
D−i (14)
(14)
= 𝑨𝑨 +
(𝟏𝟏 − 𝑨𝑨 )
𝑷𝑷𝑷𝑷,𝑹𝑹 ℎ+1 =D𝑨𝑨𝑷𝑷𝑷𝑷,𝑹𝑹 + (𝟏𝟏
𝑖𝑖
(1 − 𝐴𝐴
 )D−i𝑖𝑖 (1 − 𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅 )D−i (14)
 𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅
 D𝑖𝑖 
,𝑅𝑅ℎ+1 𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅 −D 𝑨𝑨𝑷𝑷𝑷𝑷,𝑹𝑹
𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅 ) i  M 𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅
𝑷𝑷𝑷𝑷,𝑹𝑹
 D ℎ ℎ
𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ+1 = APC, w    𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊 
ℎ ℎ
 ℎ (1 − 𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ )D−i. (14)
(16)
𝐴𝐴 withwith
= 𝑨𝑨 + (𝟏𝟏 i  M  i
𝐴𝐴 − 𝑨𝑨 = 𝑨𝑨)  i  M  i  𝐴𝐴 𝑖𝑖
(1 − 𝐴𝐴 ) D−i
𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅ℎ+1 𝑷𝑷𝑷𝑷,𝑹𝑹 𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅𝑷𝑷𝑷𝑷,𝑹𝑹
0 𝑷𝑷𝑷𝑷,𝑹𝑹 𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅ℎ 𝐶𝐶𝐶𝐶𝐶𝐶,𝑅𝑅ℎ
with 𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅0 = 𝑨𝑨𝑷𝑷𝑷𝑷,𝑹𝑹 𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅0 = 𝑨𝑨𝑷𝑷𝑷𝑷,𝑹𝑹
h n 1 𝐴𝐴
 n 𝐶𝐶𝐶𝐶𝐻𝐻,𝑊𝑊
 = 𝑨𝑨n𝑷𝑷𝑷𝑷,𝑾𝑾
n-i  n  i

0 1
APC,nW1 =n   𝐴𝐴pAi PC, (1W- =p)=   p (1 - p)n-i
𝑨𝑨𝑷𝑷𝑷𝑷,𝑹𝑹 (15) (15)
The
APC, W = 
availability i i for𝐶𝐶𝐶𝐶𝐻𝐻,𝑅𝑅
 i 1 p (1 - p)
i 1 - (1 - p)n
i 1  =
the
n-i write
0 i 1  i 
operation
= 1 - (1 - p)n
of PC protocol is: (15)
= 1 A- (1 - p)=n  n  pi (1 - p)n-i
n 1
(15)
PC, W 
  D  D (15)
D
𝐴𝐴 D  i  𝑖𝑖   𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ 𝑖𝑖 (1D−i
i 1APC, w
𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ+1 = − 𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ )D−i. (16)
𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ+1 = APC,  
D wD   𝐴𝐴
  i  = 1𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊
- (1 - p)
iM(1n i− 𝐴𝐴
𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ ) . (16)

ℎ
𝐶𝐶𝐶𝐶,𝑊𝑊ℎ+1 = APC, wwith  i M 𝐴𝐴  ℎ 𝑖𝑖 (1 − 𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ )D−i.
 𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊 (16)
M  i
iTherefore,  the overall 𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑊𝑊0 = 𝑨𝑨of
availability the
𝑷𝑷𝑷𝑷,𝑾𝑾 write operation for the CBH protocol
D
 D 
w 0 = 𝑨𝑨h𝐴𝐴+𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊
for=aAtree of height 1 is: (1 − 𝐴𝐴
𝑖𝑖
 i  𝑷𝑷𝑷𝑷,𝑾𝑾
𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ+1 PC,𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑊𝑊 𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ )
D−i
. (16)
ℎ
𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑊𝑊0 =i 𝑨𝑨M𝑷𝑷𝑷𝑷,𝑾𝑾
 
h 57
𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑊𝑊0 = 𝑨𝑨𝑷𝑷𝑷𝑷,𝑾𝑾
n
APC, W =   i  p (1 - p)
1 No.
i n-i
(15
Therefore, the overall availabilityJournal ICT,i 16,
of theofwrite  1 (June)
operation for2017, pp: 43–protocol
the CBH 62 for a tree of height h +
is: = 1 - (1 - p)n
D
 D D  D  𝑖𝑖 (16)
w APC,
𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ+1 = APC, =  i i
w  𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊
 ℎ (1 − 𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶,𝑊𝑊ℎ )D−i.
 . (16
 
iM iM  
with with
𝐴𝐴𝐶𝐶𝐶𝐶𝐻𝐻,𝑊𝑊0 = 𝑨𝑨𝑷𝑷𝑷𝑷,𝑾𝑾
=
Comparison of Data Availabilities

Comparison of Data Availabilities
Figure 5 shows the comparison of the read availability of the CBH, DH and
Figure 5 shows the comparison of the read availability of the CBH, DH and DR2M protocols for 121
DR2M protocols for 121 nodes. Here, we assume that the DH and the CBH
odes. Here, we assume that the DH and the CBH protocols have descendants s = 3. The result in
protocols have descendants s = 3. The result in Figure 5 indicates that CBH
Figure 5 indicates that CBH has the highest read availability compared to the DR2M and the DH
rotocols. Thishasisthe highest
due to theread
factavailability compared
that CBH needs onlytotothe DR2M
access and thenumber
a small DH protocols.
of replicas for the
This is due to the fact that CBH needs only to access a small
ead operations. The result shows that for read availability, the CBH protocol number of replicas
has an average of
0.9% higher for the readtooperations.
compared The result
the DR2M protocol shows
and 16.8%that for read
higher availability,
compared to thethe
DHCBHprotocol for all
protocol has
robabilities of data accessing.an average of 10.9% higher compared to the DR2M protocol
and 16.8% higher compared to the DH protocol for all probabilities of data
accessing.
Read Availability
Probability
Figure Figure 5. Comparison

5. Comparison of read of read availability
availability for 121for 121 nodes.
nodes.
The comparison for the write availability of the CBH, DH and DR2M
protocols for 121 nodes is depicted in Figure 6. It shows that DR2M has the
highest write availability than the DH and the CBH protocols. This is because
of the fact that in DR2M, the write quorum size is small and the read and
write operations execute at the same quorum. On the other hand, CBH has
the lowest write availability as compared to the DH and the DR2M protocols.
58
Write Availability
Probability
Figure 6. Comparison of write availability for 121 nodes.

Figure 6. Comparison of write availability for 121 nodes.
CONCLUSION
CONCLUSION
A new replica control protocol named Clustering-Based Hybrid (CBH)

A new replica control protocol named Clustering-Based Hybrid (CBH) protocol has been proposed in
protocol
this paper for the has beenof
management proposed in this
replicated datapaper for thescale
in a large management
distributedof system
replicated data
such as data grid.
In the proposedinprotocol,
a large scale
the distributed system
N sites in the networksuchare
as data grid. grouped
logically In the proposed protocol,
into several nonintersecting
groups called the N sitesand
clusters in the networkinarea logically
organized groupedThe
tree structure. into CBH
severalprotocol
nonintersecting
employs a hybrid
groups called clusters and organized in a tree
replication strategy by combining the advantages of the Primary Copy (PC)structure. The CBH protocol
protocol and the Tree
Quorum (TQ) employs
protocol toa improve
hybrid replication strategy
the performance andbyavailability
combiningofthe theadvantages
protocol. InofCBH,
the grouping
the nodes into Primary
clusters Copy (PC) protocol
and having only oneand the Tree
replica in eachQuorum
cluster(TQ)
has protocol
resulted to
in improve
a small number of
replicas involved in performing the read and write operations. In comparison withthe
the performance and availability of the protocol. In CBH, grouping thenodes
DR2M protocol
into clusters
and the DH protocol, the and
CBH having onlyprovides
protocol one replica in each
lower readcluster has resultedcost
communication in a while
small providing
number of
higher read availability replicas
which involved
is suitable in performing
for large scale systems the inread
gridand write operations.
environments.
In comparison with the DR2M protocol and the DH protocol, the CBH
protocol provides lower read communication cost while providing higher read
availability which is suitable for large scale systems in grid environments.
ACKNOWLEDGMENT
ACKNOWLEDGMENT
We wish to acknowledge the support of the Malaysian Ministry of Higher

Education (MOHE) under the Fundamental Research Grant Scheme (FRGS)
No: 08-01-13-1220FR.
59
REFERENCES
Abawajy, J. H., & Mat Deris, M. (2014). Data replication approach with
consistency guarantee for data grid. IEEE Transaction on Computers,
63(12), 2975-2987.
Abdullah, A., Othman, M., Sulaiman, M. N., Ibrahim, H., & Othman, A. T.
(2004). A simulation study of data discovery mechanism for scientific
data grid environment. Journal of Information and Communication
Technology (JICT), 3 (1). 19-32.
Agrawal, D., & El Abbadi, A. (1990). The Tree Quorum protocol:An efficient
approach for managing replicated data. Proceedings of the 16th
International Conference on Very Large Databases, 243-254.
Ahamad, M., Ammar, M.H., & Cheung, S.Y. (1992). Replicated data
management in distributed systems. Readings in Distributed Computing
Systems, 572-591. doi:10.1.1.45.5283
Bernstein, P. A., & Goodman, N. (1984). An algorithm for concurrency control

and recovery in replicated distributed database. ACM Transaction
Database Systems, 9(4), 596-615. doi:10.1145/1994.2207
Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., & Tuecke, S. (2000).
The data grid: Towards an architecture for the distributed management
and analysis of large scientific datasets. Journal of Network and
Computer Applications, 23(3), 187-200. doi: 10.1006/jnca.2000.0110
Choi, S. C., & Youn, H. Y. (2012). Dynamic hybrid replication effectively

combining tree and grid topology. The Journal of Supercomputing,
59(3), 1289-1311. doi: 10.1007/s11227-010-0536-6
Chung, S. M. (1994). Enhanced tree quorum algorithm for replica control

in distributed database systems. Data and Knowledge Engineering,
Elsevier, 12(1), 63-81. doi:10.1016/0169-023X(94)90022-1
Foster, I., Kesselman, C., & Tuecke, S. (2001). The anatomy of the grid:
Enabling scalable virtual organizations. International Journal
of High Performance Computing Applications, 15(3), 200-222.
doi:10.1177/109434200101500302
Garcia-Molina, H., & Barbara, D. (1985). How to assign votes in a distributed

system. Journal of the ACM (JACM), 32(4), 841-860.
60
Gifford, D. K. (1979). Weighted voting for replicated data. Proceedings of the

7th Symposium on Operating System Principles, 150-162.
Koch, H. (1993). An efficient replication protocol exploiting logical tree

structures. The 23rd Annual International Symposium on Fault-Tolerant
Computing, 382-391.
Krauter, K., Buyya, R., & Maheswaran, M. (2002). A taxanomy and survey
of grid resource management systems for distributed computing.
International Journal of Software Practice and Experience,, 32(2),
135-164. doi:10.1002/spe.432
Lamehamedi, H., Shentu, Z., & Syzmanski, B. (2003). Simulation of dynamic

data replication strategies in data grids. Proceedings of the 17th
International Symposium on Parallel and Distributed Processing, 1-10.
Lamehamedi, H., Syzmanski, B., Shentu, Z., & Deelman, E. (2002). Data
replication in grid environment. Proceedings of the Fifth International
Conference on Algorithms and Architectures for Parallel Processing
(ICA3PP’02), 378-383.
Latip, R., Ibrahim, H. , Othman, M., Sulaiman, M. N., & Abdullah, A. (2008).
High availability with diagonal replication in 2D Mesh (DR2M)
protocol for grid environment. Journal of Computer and Information
Science, 1(2), 95-1005. doi: 10.5539/cis.v1n2p95
Latip, R., Ibrahim, H., Othman, M., Abdullah, A., & Sulaiman, M.N. (2009).
Quorum-based data replication in grid environment. International
Journal of Computational Intelligence Systems (IJCIS), 2(4), 386-397
doi:10.2991/ijcis.2009.2.4.7
Latip, R., Mabni, Z., Ibrahim, H., Abdullah, A., & Hussin, M. (2014). A
clustering-based hybrid replica control protocol for high availability in
grid environment. Journal of Computer Science, 10(12), 2442-2449.
Mabni, Z., & Latip, R. (2011). A comparative study on quorum-based replica

control protocols for grid environment. In A. Abd Manaf et al. (Ed.),
Informatics Engineering and Information Science (Vol. 253, pp. 364-
377): Springer Berlin Heidelberg.
61
Mabni, Z., Latip, R., Ibrahim, H., & Abdullah, A. (2014). Cluster-based
replica control protocol for improving data availability in data grid.
Proceedings of the Malaysian National Conference on Databases 2014
(MANCoD’14), 75-80.
Madhuram, S., & Kumar, A. (1994). A hybrid approach for mutual exclusion
in distributed computing systems. Sixth IEEE Symposium on Parallel
and Distributed Processing, 18-25.
Mat Deris, M., Abawajy, J. H., & Suzuri H. M. (2004). An efficient replicated
data access approach for large-scale distributed systems. IEEE
International Symposium on Cluster Computing and the Grid, 588-594.
Stonebraker, M. (1979). Concurrency control and consistency of multiple

copies of data in distributed ingres. IEEE Transaction on Software
Engineering, 5(3), 188-194. doi:10.1109/TSE.1979.234180
Thomas, R., H. (1979). A majority consensus approach to concurrency control

for multiple copy databases. ACM Transaction Database Systems, 4(2),
80-229.
Yusof, Y., Madi, M., & Hassan, S. (2012). Dynamic replication strategy
based on exponential model and dependency relationships in data grid.
Journal of Information and Communication Technology (JICT), 11,
193-206.
Zhou, W., & Holmes, R. (1999). The design and simulation of a hybrid
replication control protocol. Fourth International Symposium on
Parallel Architectures, Algorithms, and Networks (I-SPAN ‘99), 210-
215. doi: 10.1109/ISPAN.1999.778941
62

Faculty of Computer Science and Information Technology and Institute For Mathematical Research (INSPEM) Universiti Putra Malaysia, Malaysia

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Faculty of Computer Science and Information Technology and Institute For Mathematical Research (INSPEM) Universiti Putra Malaysia, Malaysia

Uploaded by

Copyright:

Available Formats

Journal of ICT, 16, No.

1 (June) 2017, pp: 43–62

A HIGH AVAILABILITY CLUSTER-BASED REPLICA

Zulaile Mabni, 3Rohaya Latip, 2Hamidah Ibrahim & 1Azizol Abdullah

Data replication is widely used to provide high data availability,

Key words: Data replication, grid computing, data availability, communication

Grid computing is a distributed network computing system that enables large

Received: 19 March 2016 Acepted: 7 December 2016

In a replicated distributed system, multiple copies or replicas of an object

In the literature, many replica control protocols have been proposed in

performance of earlier protocols. The proposed protocol groups nodes into

Primary Copy Protocol

Read-One Write-All (ROWA) protocol is another simple protocol for

Figure 1. A tree organization of 13 copies of data objects (Agrawal, 1990).

Diagonal Replication on 2D Mesh Protocol

In Diagonal Replication on 2D Mesh structure (DR2M) nodes are organized

ynamic Hybrid Protocol

ynamic Hybrid (DH) protocol

Review of Replica Control Protocols

CLUSTERING-BASED HYBRID PROTOCOL

Figure 4. System model of CBH in a ternary tree of height = 2 with 81 nodes.

Here, we describe the hybrid algorithm of the CBH protocol, where it

In the CBH protocol, for global replication, a write operation is based on

Correctness of CBH Algorithm

A replica control protocol is said to achieve one-copy equivalence if any

Theorem: The CBH protocol guarantees the intersection of read

Proof: The proof is by induction on the height of the trees.

Induction Hypothesis: Assume that the theorem holds for a tree

Induction Step: Consider a tree of height h + 1. The read quorum

RQ = {Root Replica} or {Majority of physical replicas of sub

WQ = {Root Replica} and {Majority of physical replicas of sub

Thus, by induction, the CBH protocol guarantees a non-empty intersection

PERFORMANCE ANALYSIS AND COMPARISON

Communication Cost Analysis

The communication cost of an operation is directly proportional to the size

Diagonal Replication on 2D Mesh Protocol

CDH,C = ,ℎWn+ = 1ℎn+ 𝑔𝑔 (4) (4)

Comparison for the Write Communication Costs of the Protocols

Number of nodes in the system

Data Availability Analysis

Diagonal Replication on 2D Mesh Protocol

Thus, the overall write availability of

Comparison of Data Availabilities

Figure Figure 5. Comparison

Figure 6. Comparison of write availability for 121 nodes.

A new replica control protocol named Clustering-Based Hybrid (CBH)

We wish to acknowledge the support of the Malaysian Ministry of Higher

Bernstein, P. A., & Goodman, N. (1984). An algorithm for concurrency control

Choi, S. C., & Youn, H. Y. (2012). Dynamic hybrid replication effectively

Chung, S. M. (1994). Enhanced tree quorum algorithm for replica control

Garcia-Molina, H., & Barbara, D. (1985). How to assign votes in a distributed

Gifford, D. K. (1979). Weighted voting for replicated data. Proceedings of the

Koch, H. (1993). An efficient replication protocol exploiting logical tree

Lamehamedi, H., Shentu, Z., & Syzmanski, B. (2003). Simulation of dynamic

Mabni, Z., & Latip, R. (2011). A comparative study on quorum-based replica

Stonebraker, M. (1979). Concurrency control and consistency of multiple

Thomas, R., H. (1979). A majority consensus approach to concurrency control

You might also like