Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

A formal specication of the Kademlia distributed hash table

Isabel Pita
Dept. Sistemas Informáticos y Computación

Universidad Complutense de Madrid

ipandreu@sip.ucm.es

Resumen to le sharing applications. There are two ba-


sic approaches for searching contents in P2P
Kademlia is a peer-to-peer distributed hash networks: the unstructured approach is based
table (DHT) currently used in the P2P eDon- on ooding the network and was used in the
key le sharing network. The most popular rst implementations of P2P networks, like
clients used to connect to the Kad network are Gnutella. The structured approach uses a dis-
eMule, aMule and BitTorrent. As other DHT, tributed hash table (DHT) and is the one cur-
Kademlia look-up algorithm takes log n steps, rently in use in most P2P networks. A large
which can be reduced to log2b n by increas- number of DHTs have been studied through
ing the node's routing table size. It also oers theoretical simulations and analysis over the
a number of desirable features not oered by last years, such as Chord [18], CAN [14] or
any previous DHT, which makes it the only Pastry [15]. But, despite the large eort de-
DHT used in real networks. These features re- voted to the topic only Kademlia [8] is being
sult from the use of a notion of distance be- used in real P2P networks through the eMule
tween objects introduced by Kademlia. Both [6] and aMule [1] clients which give access to
nodes and shared les are represented by n- millions of users. Also BitTorrent has intro-
bit keys, and their relation depend on the dis- duced a Kademlia DHT in its P2P network [5],
tance between their keys. In this sense, nodes although it is not compatible with the eMule,
keep information about les close or near to or aMule one.
them in the key space and the search algo-
rithm is based on looking for the closest node The large number of users involved in cur-
(or almost closest node, if the information is rent P2P networks and the lack of a central
replicated) to the le key. authority that certicates the trust of the par-
This paper explains the specication of the ticipating nodes imply that the system must
behaviour of a P2P network that uses the be able to operate even though some partici-
Kademlia DHT in the formal specication lan- pants are malicious. DHT security, in particu-
guage Maude. We use the initial description of lar, the problem of ensuring ecient and cor-
the Kademlia DHT and ll some open issues rect peer discovery despite adversarial interfer-
with the eMule real implementation. We allow ence, has been addressed in a number of works
peers to connect to the network and leave it by [17, 20, 11]. However, the majority of these
simulating time using the Real Time Maude studies examine the types of problems, draw-
facilities. ing examples from existing systems, or exper-
imentally evaluate the attacks over the net-
works. Despite the great success formal meth-
1. Introduction ods have had in the analysis of distributed net-
works and protocols, their contribution to P2P
Peer-to-peer (P2P) systems have seen a networks is scarce. In [9], Mühl gives formal
great growth in the last few years mainly due semantics of publish/subscribe systems based
on sequential traces using the syntax of linear 2. The Kademlia DHT
temporal logic. The work formalizes and stud-
ies the correctness of several routing congura- Nodes in a P2P network realize two basic
tions: ooding, simple routing, identity-based tasks: they put their les at the disposal of oth-
routing, . . . However, it does not include DHT er users and access the les shared by the oth-
based routing algorithms. Borgströn et al. in ers. The networks that use a DHT table have
[3], prove correctness of the lookup operation similar approaches for solving these problems;
of the DHT-based DKS system, developed in they identify both nodes and les with n-bit
the context of the EU-project [7], for a stat- quantities, and keep the information of shared
ic model of the network using value-passing les in the nodes with an ID close to the le
CCS. Finally, Bakhshi and Gurov, in [2] give ID. Then, the look-up algorithm is based on lo-
a formal verication of Chord's stabilization cating successively closer nodes to any desired
algorithm using the π -calculus. But, as it is key. The DHTs dier on the notion of close to
said in [11], the question is whether the P2P they applied. In particular, Kademlia denes
approach is mature enough to step outside of the distance between two IDs as the bitwise
its comfort zone of le sharing and related ap- exclusive (XOR) of the n-bit quantities.
plications. In particular, not much is known Each node stores contact information about
about the ability of DHTs to meet critical se- others. In Kademlia, every node keeps a list
curity requirements (as those required nowa- of: IP address, UDP port and node ID, for
days, e.g., for domain name servers) and its nodes of distance between 2i and 2i+1 from
ability to withstand attacks. itself, for i = 1 . . . n and n the ID length. In
Our goal is to study the possibilities oered the Kademlia paper [8] these lists, called k-
by formal methods to prove the correctness buckets, have at most k elements, where k is
of the dynamic aspects of P2P networks and chosen such that any given k nodes are very
nd possible attacks to them. We start with unlikely to fail within an hour of each other.
the Kademlia network, as it is the one already k-buckets are kept sorted by time last seen.
implemented and in use, and focus our work When a node receives any message (request or
on the routing algorithms. We use the initial reply) from another node, it updates the ap-
description of the Kademlia DHT [8] and ll propriate k-bucket for the sender's node ID. If
some open issues with the eMule real imple- the sender node exists, it is moved to the tail
mentation. See [10] for a thorough analysis of of the list. If it does not exist and there is free
the source code of eMule version 0.47a and space in the appropriate k-bucket it is inserted
[6] for the source code (v0.50a). We are us- at the tail of the list. Otherwise, the k-bucket
ing the Maude formal specication language has not free space, the node at the head of the
based on rewriting logic [4, 12] as it has been list is contacted and if it fails to respond it is
successfully applied in similar problems, like removed from the list and the new contact is
network communication protocol analysis [19] added at the tail. In the case the node of the
and it oers simple an elegant time simulation head of the list responds, it is moved to the
resources. tail, and the new node is discarded. This pol-
The paper is organized as follows: Section 2 icy gives preference to old contacts, and it is
gives a short overview of the Kademlia DHT, due to the analysis of Gnutella data collect-
focused on the aspects we have considered for ed by Saroiu et al. [16] which states that the
the moment. Next, we explain the formaliza- longer a node has been up, the more likely it
tion of the dierent parts of the network and is to remain up another hour.
the interaction among them. Then, we intro- k-buckets are organized in a binary tree
duce the notion of time and show the formal- called the routing table. Each k-bucket is iden-
ization of the processes of looking for a le and tied by the common prex of the IDs it con-
publishing les. Finally some open issues are tains. Internal tree nodes are the common pre-
outlined. x of the k-buckets, while the leaves are the
the k nodes with closest IDs to the le ID.
First, the node sends a FIND-VALUE RPC to
the α nodes it knows with an ID closer to
the le ID, where α is a system concurrency
parameter. As nodes reply, the initiator sends
new FIND-VALUE RPCs to nodes it has learned
about from previous RPCs, maintaining α ac-
tive RPCs. Nodes that fail to respond quickly
are removed from consideration. If a round of
FIND-VALUE RPCs fails to return a node any
closer than the closest already seen, the ini-
tiator resends the FIND-VALUE to all of the k
closest nodes it has not already queried. The
process terminates when any node returns the
Figura 1: A routing table example for node value or when the initiator has queried and
00000000 gotten responses from the k closest nodes it
has seen.
k-buckets. Thus, each k-bucket covers some
range of the ID space, and together the k- Publishing a shared le. Publishing is
buckets cover the entire ID space with no over- performed automatically whenever a le needs
lap. Figure 1 shows a routing table for node it. To maintain persistence of the data, les are
00000000 and a k-bucket of length 5. IDs have published by the node that shares them every
8 bits. 24 hours. Nodes that know about a le publish
The Kademlia protocol consists of four Re- it every hour.
mote Procedure Calls (RPCs): To publish a le, a peer locates the k clos-
est nodes to the key, as it is done in the
• PING probes a node to see if it is online. looking for a value process, although it uses
the FIND-NODE RPC. Once it has located the
• STORE instructs a node to store a le ID nodes, the initiator sends the rst ten a STORE
together with the contact of the node that RPC.
shares the le.

• FIND-NODE takes an ID as argument and 3. Network representation


the recipient returns the contacts of the k
nodes it knows about closest to the target The Kademlia network is modeled as a
ID. Maude conguration of objects and messages.
The objects represent the peers.
• FIND-VALUE takes an ID as argument. If
the recipient has information about the class Peer | RT : RoutingTable ,
argument, it returns the contact of the Files : TFileTable ,
Publish : TPublishFile ,
node that shares the le, otherwise, it re-
SearchFiles : TSearchFile ,
turns a list of the k contacts it knows SearchList : TemporaryList ,
about closest to the target. Life : TimeInf ,
Reconnect : TimeInf .
In the following we summarize the processes
of looking for a value and publishing a shared where the object identication consists of:
le from the Kademlia paper [8]. the peer IP address; its UDP port; and its node
ID. It is dened of sort Triple.
Looking for a value. To nd a le ID, a The attributes related to the Kademlia net-
node starts by performing a look up to nd work are:
• RT keeps the information of the routing The STORE message has an aditional param-
table. eter that represents the le ID to be stored
by the node and the identication of the node
• Files keeps the information of the les that shares the le.
the peer is responsible for. It includes the
les ID and the identication of the peer msg STORE :
that shares the le. Triple Triple TFileTable TimeInf TimeInf
-> Msg .
• Publish keeps the information of the msg STORE-REPLY :
shared peer les. The information in- Triple Triple TimeInf TimeInf -> Msg .
cludes the les ID and the le's location
in the peer. The FIND-NODE message has an aditional pa-
rameter that represents the key the sender is
• SearchFiles keeps the les a peer is look- looking for. The reply has an additional pa-
ing for. A peer may look for many les. rameter that keeps a list of the k nodes the
peer knows about closest to the target, where
• SearchList is a temporary list used in the
k is the bucket dimension. The information is
search process.
obtained from the routing table of the node
The attributes used for the Maude simula- that receives the RPC.
tion are:
msg FIND-NODE :
• Life, is the time the peer will remain con- Triple Triple BitString TimeInf TimeInf
nected. The value is updated as time pass- -> Msg .
msg FIND-NODE-REPLY :
es. When it is set to zero it means that
Triple Triple List{Triple} TimeInf
the peer has left the network. It is set to TimeInf -> Msg .
a random value when the peer is connect-
ed. The FIND-VALUE message has an addition-
al parameter that represents the le ID the
• Reconnect, is the time to be connected
sender is looking for. The message has two
again. It is set to a random value when a
possible replies. If the receiver has information
node leaves the network.
about the le in its Files table it returns the
The messages represent the RPCs. There is contact of the node that shares the le. If the
a message for each RPC dened in the Kadem- receiver has not information about the le, it
lia protocol. The rst two parameters of the returns the closest nodes to the le ID, like the
message are the peer that sends the message FIND-NODE message.
and the peer that receives it. The last two pa-
msg FIND-VALUE :
rameters are used to control the course of time. Triple Triple BitString TimeInf TimeInf
The last but one controls the messages that are -> Msg .
not attended because the receiver has left the msg FIND-VALUE-REPLY1 :
network. When a message is sent it is assigned Triple Triple List{Triple} TimeInf
a time, and when this time passes the mes- TimeInf -> Msg .
sage is removed from the conguration. The msg FIND-VALUE-REPLY2 :
last parameter is the time it takes in the Real- Triple Triple BitString TimeInf TimeInf
Time-Maude system the RPC. For the time -> Msg .
being, each RPC is assigned one time unit.
The PING RPC syntax is: 3.1. The routing table
msg PING : Although the routing table is depicted in [8]
Triple Triple TimeInf TimeInf -> Msg . as a binary tree, it can be represented as a list
msg PING-REPLY : of k-buckets since for each internal tree node
Triple Triple TimeInf TimeInf -> Msg . the subtree whose prex does not match with
the peer ID is a leave. For the same reason • op add-entry :
it is not worth representing it as a trie ADT. Triple Triple RoutingTable
The k-bucket's position in the list is given by -> RoutingTable .
its prex so looking for a k-bucket is done se- Adds an entry to a routing table.
quentially following the prex. The steps are
the same as if we were looking for it in the • op free-bucket :
tree. Although it is proposed in [8] a routing Triple BitString RoutingTable -> Bool .
table optimization that allows more contacts Checks if a k-bucket is full.
for IDs close to the peer ID, we haven't con-
• op closest-nodes :
sidered it yet in the specication. Nevertheless BitString RoutingTable Nat
we expect it will not be necessary to build a -> List{Triple} .
complete binary tree. The eMule routing ta-
ble [10] also has more k-buckets in each node Returns the list of the n closest contacts
than the routing table considered in [8], since to a given node in the routing table.
the subtree whose prex does not match with
the peer ID may be a semi-complete tree of 3.2. Shared les
height four. Again the modication is local and There are two dierent concepts concerning
bounded so we expect to nd a more ecient shared les. On the one hand, a node shares
representation than a binary tree. some les. Each node has a table with infor-
The empty routing table is represented by mation about these les, the key is the le ID,
an empty bucket that covers all the ID space. while the value includes the le's name and the
subsort Bucket < RoutingTable . time to republish it. The Maude specication
op _||_ : is.
Bucket RoutingTable -> RoutingTable [ctor] .
sort KeyPublishFile .
The information about nodes stored in the sort InfoPublishFile .
routing table includes the IP address of the subsort BitString < KeyPublishFile .
node, the UDP port and the node ID. In the op __ : String TimeInf -> InfoPublishFile
following we call them contacts. [ctor] .
k-buckets are a list of contacts, which can
be empty. The order in which identiers are Operations on this abstract data type
allocated in the list is important, since the (ADT) include the typical operations for ta-
most recent identiers are removed rst, in bles plus an operation to handle the time. The
this sense its behaviour is similar to a queue. generic denition of the table is:
k-buckets are dened as follows: sort InfoTable{X,Y} .
subsort Triple < Bucket . sort Table{X,Y} .
op empty-bucket : -> Bucket [ctor] . subsort InfoTable{X,Y} < Table{X,Y} .
op _|_ : Bucket Bucket -> Bucket op <__> : X$Elt Y$Elt -> InfoTable{X,Y}
[assoc id: empty-bucket ctor] . [ctor] .
op empty-table : -> Table{X,Y} [ctor] .
The k-bucket number of elements is set by op __ : Table{X,Y} Table{X,Y}
the constant: -> Table{X,Y}
op bucketDim : -> Nat . [assoc comm id: empty-table ctor] .
op store : X$Elt Y$Elt Table{X,Y}
The routing table oers the following oper- -> Table{X,Y} .
ations: op _in_ : X$Elt Table{X,Y} -> Bool .
op remove : X$Elt Table{X,Y}
• op move-to-tail : -> Table{X,Y} .
Triple Triple RoutingTable op find : X$Elt Table{X,Y} -> Y$Elt .
-> RoutingTable . op _monus_ : Table{X,Y} Time
Moves a contact to the tail of its k-bucket. -> Table{X,Y} .
op key? : InfoTable{X,Y} -> X$Elt . sort Node-Time .
op value? : InfoTable{X,Y} -> Y$Elt . sort TemporaryList .
subsort Node-Time < TemporaryList .
The concrete table used to represent the les --- 1 param : Node ID (key, IP port and UDP)
a peer publishes is: --- 2 param : Distance to the search key
TABLE{KeyPublishFile,InfoPublishFile} * --- 3 param : Time since the RPC was send
(sort --- 4 param : Flag
Table{KeyPublishFile,InfoPublishFile} --- 0 indicates the RPC was not sent,
to TPublishFile) . --- 1 the RPC was sent,
--- 2 the RPC has responded,
On the other hand, each node keeps infor- --- 3 the store message is sent
mation of the les that have a key value close op <____> : Triple Nat TimeInf Nat
to its own key identication. This information -> Node-Time [ctor] .
includes the le ID, the ID of the node that op empty-list : -> TemporaryList [ctor] .
op insert : Node-Time TemporaryList
stores the le and a time value. The informa-
-> TemporaryList [ctor] .
tion about the le ID and the node ID is used
in the search process. In [8] it is not speci-
ed the number of nodes that keep informa- 3.3. Searched les
tion about a le, we use the value dened in One of the tasks a node performs in a P2P
the eMule paper [10] which set it to ten. The network is searching for information. Each
time information is used to republish the les node keeps a table of the les a peer is look-
to ensure data persistence. The table speci- ing for. The key is the le ID and the value
cation is: includes the le name and the time for expira-
sort KeyFileTable . tion.
sort InfoFileTable .
sort KeySearchFile .
subsort BitString < KeyFileTable .
sort InfoSearchFile .
op __ : Triple TimeInf -> InfoFileTable
subsort BitString < KeySearchFile .
[ctor] .
--- Time for expiration:
The table is --- 0 has already been searched and found.
--- > 0 < 50 is ready to be searched.
TABLE{KeyFileTable,InfoFileTable} * --- > 50 the file is waiting.
(sort op _;_ : String TimeInf -> InfoSearchFile
Table{KeyFileTable,InfoFileTable} [ctor] .
to TFileTable) .
The table is
To publish a le, a node has to nd the k
nodes with the closest key to the le ID. As the TABLE{KeySearchFile,InfoSearchFile} *
information in the node's routing table may (sort
not include the closest nodes, it should search Table{KeySearchFile,InfoSearchFile}
for them. Now, we follow the eMule implemen- to TSearchFile.
tation of the process. The node looks in the
routing table for contacts that are as near as 4. Modeling time
possible to the le key and keeps them, ordered
by distance to the le key, in a temporary list. Simulating the behaviour of a P2P network
The information for each contact in the tem- requires a notion of time. In the current speci-
porary list is its key, and the time passed since cation, time passes when some action occurs,
the RPC was sent. In this version of the speci- in particular since the only actions are the
cation we admit only one search-publish pro- RPCs, we assume that each of them takes a
cess at a time. To admit more searches we need unit time.
to dene a map of temporary lists to keep the We use Maude's REAL-TIME-MAUDE module
information about each search. with discrete time units to model time. Rules
are divided into tick rules, that model the 5. Network processes
elapse of time on the system, and instanta-
neous rules, that model changes in (part of) We present two processes:
the system and are assumed to take zero time.
The tick rule has the form: 5.1. Looking for a le
crl [tick] : { C } => { delta(C,mte(C)) } The searching process starts automatically
in time mte(C) when there are IDs in the SearchFiles at-
if mte(C) =/= INF and mte(C) =/= 0 . tribute of some peer that we will call the initia-
tor. In this version we permit only one search
where per node at a time. The life time of the ini-
tiator, K1, should be greater than zero; other-
• op mte : Configuration
-> TimeInf [frozen (1)] . wise, the node is supposed to be disconnected.
The expiration time, TM1, should be greater
calculates the number of time units that than zero since the zero value indicates that
occur as the minimum of the congura- the search has nished or no peer has found
tion messages and objects time units, and the le. It should also be less that n, set to
50 at this time, since a greater value indicates
• op delta : Configuration TimeInf
-> Configuration [frozen (1)] . that the le has been already searched for but
it was not found and now is waiting to repeat
denes the eect of time elapse on a con- the search.
guration. For connected peers (Life >
0), it changes the time to republish a le crl [lookfor-file1] :
(attributes Files and Publish), the time < SENDER : Peer | RT : R1 ,
left to obtain a response in the temporary SearchFiles : < I1 (S1 ; TM1) > SF ,
search list (attribute SearchList) and the SearchList : empty-list , Life : K1 >
time left to disconnect the peer (attribute =>
Life). For disconnected peers, only the < SENDER : Peer | RT : R1 ,
SearchFiles : < I1 (S1 ; INF) > SF ,
time to reconnect is changed.
SearchList : create-search-list(
closest-nodes(I1,R1,10), I1) ,
eq delta Life : K1 >
(< P1 : Peer | RT : R1 , if K1 > 0 /\ K1 =/= INF /\ TM1 > 0 /\
Files : FT1 , Publish : PF, TM1 < 50 .
SearchFiles : SF , SearchList : SL,
Life : K1 , Reconnect : INF >,TM) The expiration time of the search le is set
= to INF to indicate that the process is initiat-
< P1 : Peer | RT : R1 ,
ed. The search list is lled with the closest
Files : FT1 monus TM ,
Publish : PF monus TM ,
nodes the initiator has in its routing table. The
SearchFiles : SF monus TM, closest-nodes operation returns the n closest
SearchList : SL monus TM, nodes to the key I1 in the routing table R1. We
Life : K1 monus TM , set the number of initial nodes to ten due to
Reconnect : INF > . the size of our testing network. The list is cre-
ated with the operation create-search-list,
Respect to messages, only the time to attend which inserts the nodes ordered by its distance
the message is changed. If time is set to zero to the key.
the message is removed from the system. The process continues by sending
FIND-VALUE RPCs to the rst nodes of
eq delta( the list to nd closer nodes to the le ID.
PING(SENDER,RECEIVER,TM1,1),TM) = We may have up to three active RPCs at the
PING(SENDER,RECEIVER,TM1 monus TM, 0) . same time.
crl [lookfor-file21] : if K2 > 0 /\ K2 =/= INF /\ (P3 in FT2) .
< SENDER : Peer | RT : R1 ,
SearchFiles : < I1 (S1 ; INF) > SF , rl [find-value3] :
SearchList : SL , Life : K1 > FIND-VALUE(
=> SENDER, RECEIVER, P3, 0, 0)
< SENDER : Peer | RT : R1 , =>
SearchFiles : < I1 (S1 ; INF) > SF , none .
SearchList : set-flag-process(Tr,SL) ,
Life : K1 > If the initiator receives the node that shares
FIND-VALUE(SENDER,Tr, I1, K1, 1) the le, the process ends.
if K1 > 0 /\ K1 =/= INF /\
not all-done(SL) /\ crl [lookfor-file3] :
Tr := first-not-sent(SL) /\ < RECEIVER : Peer | RT : R2 ,
messages-in-process(SL) < 3 . SearchFiles : < I1 (S1 ; INF) > SF ,
SearchList : SL , Life : K2 >
Once the RPC is sent, a ag is activated FIND-VALUE-REPLY2(
in the search list that marks this node as in SENDER,RECEIVER,P3,TM1,TM2)
=>
process. The RPC is only sent if the initiator is
< RECEIVER : Peer |
active and if there are still nodes in the search RT : move-to-tail(SENDER,RECEIVER,R2) ,
list to which no RPC has been sent. Notice SearchFiles : < I1 (S1 ; 0) > SF ,
that we have to ask as many nodes as possible, SearchList : empty-list , Life : K2 >
because there can be nodes not so close to the FIND-VALUE-REPLY2(
objective than others but that have in their SENDER,RECEIVER,P3,TM1,TM2)
routing tables information of the closest ones. if K2 > 0 /\ K2 =/= INF .
The receiver may nd the value, or it may
return the closest nodes it knows about. If the If it receives the list of the closest nodes, it
message is not attended, it is removed from changes its search list, adding the nodes or-
the system. dered by the distance to the objective. Only
nodes closer than the one which proposes them
crl [find-value1] : are added. The initiator also updates its rout-
< RECEIVER : Peer | RT : R2 , ing table, as it is always done when an RPC
Files : FT2 , Life : K2 > is received. When the full list is treated, a ag
FIND-VALUE(SENDER, RECEIVER, P3, TM,0) is activated to mark this node as done in the
=> search list.
< RECEIVER : Peer |
RT : move-to-tail(SENDER,RECEIVER,R2) , crl [lookfor-file41] :
Files : FT2 , Life : K2 > < RECEIVER : Peer | RT : R2 ,
FIND-VALUE-REPLY1(RECEIVER,SENDER, SearchFiles : < I1 (S1 ; INF) > SF ,
closest-nodes(P3,R2,bucketDim), K2,1) SearchList : SL , Life : K2 >
if K2 > 0 /\ K2 =/= INF /\ FIND-VALUE-REPLY1(
not (P3 in FT2) . SENDER,RECEIVER,Tr L,TM1,TM2)
=>
crl [find-value2] : < RECEIVER : Peer |
< RECEIVER : Peer | RT : R2 , RT : move-to-tail(SENDER,RECEIVER,R2) ,
Files : FT2 , Life : K2 > SearchFiles : < I1 (S1 ; INF) > SF ,
FIND-VALUE( SearchList : insertOrd(
SENDER, RECEIVER, P3, TM,0) < Tr distance(ID?(Tr),I1) 100 0 >,SL) ,
=> Life : K2 >
< RECEIVER : Peer | FIND-VALUE-REPLY1(
RT : move-to-tail(SENDER,RECEIVER,R2) , SENDER,RECEIVER,L,TM1,TM2)
Files : FT2 , Life : K2 > if K2 > 0 /\ K2 =/= INF /\
FIND-VALUE-REPLY2(RECEIVER, distance(ID?(Tr),I1) <
SENDER,ID?(first?(find(P3,FT2))), K2,0) distance(ID?(SENDER),I1) /\
SL =/= empty-list . messages-time0(SL) > 0 .

crl [lookfor-file42] : 5.1.1. Publishing a le


< RECEIVER : Peer | RT : R2 ,
SearchFiles : < I1 (S1 ; INF) > SF , Publish is performed automatically. Even
SearchList : SL , Life : K2 > more, to ensure the persistence of the infor-
FIND-VALUE-REPLY1( mation, nodes periodically republish les. In
SENDER,RECEIVER,Tr L,TM1,TM2) [8] not only the node that shares the le re-
=> publishes it, but also all the nodes which store
< RECEIVER : Peer |
the le ID. The process is done each hour but,
RT : move-to-tail(SENDER,RECEIVER,R2) ,
SearchFiles : < I1 (S1 ; INF) > SF ,
to avoid replication, when a node receives a
SearchList : SL , Life : K2 > STORE RPC it will not republish the le in the
FIND-VALUE-REPLY1( next hour. As said in [8], since replication in-
SENDER,RECEIVER,L,TM1,TM2) tervals are not exactly synchronized, only one
if K2 > 0 /\ K2 =/= INF /\ node will republish the le every hour, making
distance(ID?(Tr),I1) >= the process more ecient.
distance(ID?(SENDER),I1) /\ A le is published on the k nodes which have
SL =/= empty-list . the closest ID to the le ID since the other
--- nodes will look for the le there. The publish
crl [lookfor-file43] : process starts automatically when the time to
< RECEIVER : Peer | RT : R2 ,
republish a le is set to zero. It can be a node's
SearchFiles : < I1 (S1 ; INF) > SF ,
SearchList : SL , Life : K2 > shared le kept in the publish les table or a
FIND-VALUE-REPLY1( known le shared by other node.
SENDER,RECEIVER,nil,TM1,TM2) crl [publish11] :
=> < SENDER : Peer | RT : R1 ,
< RECEIVER : Peer | Publish : < I1 (S1 0) > PF ,
RT : move-to-tail(SENDER,RECEIVER,R2) , SearchList : empty-list ,
SearchFiles : < I1 (S1 ; INF) > SF , Life : K1 >
SearchList : set-flag-done(SENDER,SL) , =>
Life : K2 > < SENDER : Peer | RT : R1 ,
if K2 > 0 /\ K2 =/= INF /\ Publish : < I1 (S1 INF) > PF ,
SL =/= empty-list . SearchList : create-search-list(
closest-nodes(I1,R1,10), I1) ,
If the FIND-VALUE RPC is not attended be- Life : K1 >
cause the receiver has left the network, the if K1 > 0 /\ K1 =/= INF .
node remains in the search list blocking other
searches. When this happens the node should In the following we only explain the process
be removed from the search list. To detect that treats the shared le process; the one for
these cases, each node in the search list has the known les is similar. First the initiator
a time to reply. When this time is set to 0 the should nd the k closest nodes to the le ID.
node is removed from the list. The initiator creates the temporary list and
sends FIND-NODE RPCs to the closest nodes.
crl [lookfor-file5] : Since the process is similar to the one ex-
< SENDER : Peer | RT : R1 , plained for looking for a le we only present
SearchFiles : < I1 (S1 ; INF) > SF ,
the rules once all nodes have replied or they
SearchList : SL , Life : K1 >
have been removed from the list because their
=>
< SENDER : Peer | RT : R1 , response time has expired. Then, a STORE mes-
SearchFiles : < I1 (S1 ; INF) > SF , sage is sent to the rst three nodes of the list,
SearchList : remove-time0(SL) , that are supposed to be the closest to the le's
Life : K1 > ID. When the three STORE messages have been
if K1 > 0 /\ K1 =/= INF /\ sent the time to republish the le is set to k.
crl [publish51] : the eMule implementation with the aMule and
< SENDER : Peer | RT : R1 , BitTorrent ones.
Publish : < I1 (S1 INF) > PF , We should rene the notion of time adjust-
SearchList : SL , Life : K1 > ing the time it takes each action and the inter-
=> vals in which the automatic actions are taken
STORE(SENDER,first-not-stored(SL),
in order to make the system as realistic as pos-
< I1 (SENDER 100) >,K1,1)
< SENDER : Peer | RT : R1 , sible.
Publish : < I1 (S1 INF) > PF , The simulation will require: a process to cre-
SearchList : set-flag-store( ate random peers that could be connected and
first-not-stored(SL),SL) , disconnected from the network; stochastic pro-
Life : K1 > cesses to simulate the behaviour of the peers;
if K1 > 0 /\ K1 =/= INF /\ and a system that automatically searches for
all-done(SL) /\ les.
number-messages-store(SL) < 3 . Finally, we have to dene the properties we
want to prove in the system and use the appro-
crl [publish52] :
priate tools to prove them. The basic property
< SENDER : Peer | RT : R1 ,
Publish : < I1 (S1 INF) > PF , a P2P le sharing network must meet is that:
SearchList : SL , Life : K1 > under all circumstances, the data stored in a
=> hash table must be properly returned when
< SENDER : Peer | RT : R1 , asked for. Dierent circumstances may aect
Publish : < I1 (S1 100) > PF , the seaching process: peers joining and leav-
SearchList : empty-list , ing the network; publishing new les; search-
Life : K1 > ing for other les; . . .. Real Time Maude pro-
if K1 > 0 /\ K1 =/= INF /\ vides some techniques for proving this type of
(number-messages-store(SL) == 3 or dynamic properties [12]. It admits a reachibili-
length(SL) == number-messages-store(SL)) .
ty analysis from an initial state with a pattern
behaviour up to a certain time bound. It also
6. Open issues
provides a temporal logic model checking that
may be very useful if we can nd an appropi-
We have shown a model of a P2P network ate abstraction of the model that limits the
that uses a Kademlia DHT for searching les number of states [13].
in the formal language Maude. The model will
permit us to execute the network specica- Referencias
tion, analyze its behaviour and prove proper-
ties about it. [1] aMule homepage http://www.amule.org
But there are still some open issues in the
[2] Bakhshi, R., and Gurov, D. Verica-
model. There are more network processes, like
tion of Peer-to-peer Algoritms: A case
the one that automatically connects a node to
Study. ENTCS 181, pages 3547. Else-
the network, that need to be rened. There
vier, 2007.
are also some eMule facilities that we have
not studied yet, like the modication of the [3] Borgström J., Nestmann U., Onana L.
routing table to keep more contacts in it or and Gurov D. Verifying a Structured
the type and expire time attributes used to Peer-to-peer Overlay Network: The Stat-
keep the routing table up-to-date. It also al- ic Case In Proceedings of Global Com-
lows publishing keywords and notes related to puting 2004, LNCS 3267, pages 251-266.
les. There are some protections eMule imple- Springer 2004
ments to protect itself against possible attacks,
like the protection of hot nodes, that need a [4] Clavel, M., Durán, F., Eker, S., Lincoln,
deep study. It will also be useful to compare P., Martí-Oliet, N., Meseguer, J., Tal-
cott, C. All About Maude - A High- [15] Rowstron A, and Druschel P. Pastry:
Performance Logical Framework. LNCS Scalable, distributed object location and
4350. Springer, 2007. routing for large-scale peer-to-peer sys-
tems. Middleware 2001 : IFIP/ACM
[5] Crosby S. and Wallach D. An Analy-
International Conference on Distribut-
sis of BitTorrent's Two Kademlia-Based
ed Systems Platforms Heidelberg, Ger-
DHTs Technical Report TR-07-04, De-
many, November 12-16, 2001. Proceed-
partment of Computer Science, Rice Uni-
ings In Middleware '01: Proceedings of the
versity, Houston, TX, USA., 2007.
IFIP/ACM International Conference on
[6] eMule http://www.emule-project.net. Distributed Systems Platforms Heidelberg
(2001), pp. 329-350. 2001.
[7] EU-project PEPITO: IST-2001-33234.
Homepage: http://www.sics.se/pepito/
[16] Saroiu S, Gummadi P., and Gribble S.
[8] Maymounkov, P., and David Mazieres, A Measurement Study of Peer-to-Peer
D. Kademlia: A peer-to-peer Informa- File Sharing Systems. Technical Report
tion System Based on the XOR Met- UW-CSE-01-06-02, Department of Com-
ric. In Proceedings of the 1st Interna- puter Science and Engineering, Universi-
tional Workshop on Peer-to Peer Systems ty of Washington, july 2001.
(IPTPS02), 2002.
[17] Sit E. and Morris R. Security Consider-
[9] Mühl G. Large-Scale Content-Based Pub- ations for Peer-to-Peer Distributed Hash
lish/Subscribe Systems. Master Thesis. Tables. In Proceedings of the 1st Interna-
Darmstädter Dissertationen D17. Tech- tional Workshop on Peer-to-Peer Systems
nischen Universität Darmstadt. 2002. (IPTPS '02), Cambridge, Massachusetts,
[10] Mysicka, D. Reverse Engineering of March 2002., LNCS 2429, pages 261-269.
eMule. An analysis of the implementation Springer, 2002. In Proceedings of Middle-
of Kademlia in eMule. Semester thesis, ware, Heidelberg. 2001.
Dept. of Computer Science, Distributed
Computing group, ETH Zurich, 2006. [18] Stoica I, Morris R, Karger D, Kaashoek
M, and Balakrishnan H. Chord: A scal-
[11] Mysicka D. eMule Attacks and Measure- able peer-to-peer lookup service for In-
ments. Master Thesis. Dept. of Comput- ternet applications. IEEE/ACM Trans.
er Science, Distributed Computing group, Netw., volume 11, number 1, pages 17
(ETH) Zurich. 2007. 32. 2003.
[12] Ölveczky, P., and Meseguer, J., Seman-
tics and pragmatics of Real-Time Maude, [19] Verdejo A., Pita I. and Martí-Oliet N.
Higher Order Symbol. Comput., volume Specication and Verication of the Tree
20, number 1-2, pages 161196. Kluwer Identify Protocol of IEEE 1394 in Rewrit-
Academic Publishers. 2007, ing Logic. Formal Aspects of Comput-
ing. Volume 14, number 3, pages 228-246.
[13] Miguel Palomino Tarjuelo, Refexión, ab- Springer, 2003. In Proceedings of SIG-
stracción y simulación en la lógica de COMM, 2001.
reescritura. PhD thesis, Dept. Sistemas
Informáticos y Programación, Universi- [20] Wang P., Tyra J., Chan-Tin E., Malchow
dad Complutense de Madrid, Spain, Mar. T., Foo Kune D., Hopper N., and Kim Y.
2005. Attacking the Kad Network. In Proceed-
[14] Ratnasamy S, Francis P, Handley M, ings of the 4th International Conference
Karp R, and Shenker S. A Scalable on Security and Privacy in Communica-
Content-Addressable Network. In Pro- tion Networks (SecureComm'08). 2008.
ceedings of SIGCOMM, 2001.

You might also like