Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Distributed Systems

Peer-To-Peer

Prof. Dr.-Ing. Torben Weis


University Duisburg-Essen
The History of P2P
Gnutella
eDonkey
Freenet

Jabber Kademlia

DNS Groove WASTE


2000 Avalanche
1998 2002
1983 1997 2005

Past Today

1969 1979 1996 2006


2001
1999
ARPANET USENET ICQ PNRP

Seti@home JXTA
Napster FastTrack
BitTorrent
CAN
Chord
Pastry
Tepastry

Distributed Systems Torben Weis 2


University Duisburg-Essen
Domains of P2P

Distributed Systems Torben Weis 3


University Duisburg-Essen
Why Peer-to-Peer?

Earlier days:
Client-server architecture in area of distributed
systems
Problems:
High load on servers
C/S systems not indefinitely scalable
Server acts as Single Point of Failure (SPOF)
Excessive use of Internet required alternatives
Major goal:
Distribute load fairly on all nodes participating in
network

Distributed Systems Torben Weis 4


University Duisburg-Essen
Paradigm Shift

End of 90s Peer-to-Peer (P2P) systems begin to


displace client-server systems

Benefits

scalable dynamic
autonomous
decentral
fair (costs, bandwidth,)

Self-organizing anonymous

Today P2P traffic sums up to 60% of all Internet traffic


Distributed Systems Torben Weis 5
University Duisburg-Essen
Where does a peer live ?

Peers live in an overlay above regular IP network


Connections between peers pose virtual links
May correspond to a path consisting of several
physical links
Overlay allows routing
messages to destinations
not specified by IP address

Distributed Systems Torben Weis 6


University Duisburg-Essen
What is a peer ?

Peer

Communication Ressources

Autonomous
Equity Decentralized
System

Initialize Manage Offer Manage

Distributed Systems Torben Weis 7


University Duisburg-Essen
P2P Attributes

Distributed Systems Torben Weis 8


University Duisburg-Essen
Generations

1. Gen 2. Gen 3. Gen 4. Gen


Centralized Pure DHT-based Security?
Hybrid Anonymity?

Distributed Systems Torben Weis 9


University Duisburg-Essen
1. Generation: Centralized P2P

e.g. Napster
Server stores index
File transfer using P2P
Scalability problems
Single point of failure

Server

Peer
Connect
Query
Reply
Transfer

Distributed Systems Torben Weis 10


University Duisburg-Essen
2. Generation: Pure P2P
e.g. Gnutella

All peers are the same


Queries/Pings are forwarded
No global knowledge
Very robust
Performance problems
Message flooding
Collisions
No hit guarantee
Peer
Connect
Query
Reply
Transfer

Distributed Systems Torben Weis 11


University Duisburg-Essen
2. Generation: Hybrid P2P

e.g. Fasttrack
Different Roles
Regular Peers
Super-Peers
Structure uses hierarchy
Super-Peers use local
knowledge for queries
On miss, forward query

Super-Peer
Peer
Connect
Query
Reply
Transfer

Distributed Systems Torben Weis 12


University Duisburg-Essen
3. Generation: Distributed Hash Tables

e.g. Chord, Pastry

All peers are the same


No hierarchy
Fair load balancing
Nodes and objects are
mapped in the same key space

Key space

Data

Peer
Connect
Query
Reply
Transfer

Distributed Systems Torben Weis 13


University Duisburg-Essen
Napster - Development

Developed by Shawn Fanning


Born 1980 in Brockton, Massachusetts
Started studying Computer Science in Boston 1999
Foundation Napster Inc. 05/1999
Roxio buys remains of Napster in 2002 after bankruptcy
Friends came up with the idea of Napster
No equivalent software for direct transfer available
Goal: Exchange music within circle of friends

Distributed Systems Torben Weis 14


University Duisburg-Essen
Napster Network Structure

Star-like Structure
Central server
Farm consists of
~200 servers
Servers store indices
User connect to server:
Server-Client communication
while searching
Client-Client communication
while transferring

Distributed Systems Torben Weis 15


University Duisburg-Essen
Napster - Tasks

Portal for exchanging MP3 files


Distinct roles for clients and server
Server
Indexes all .mp3 files in network
Relays communication between peers
Clients
Goal: Download music
Upload list of shared files

Distributed Systems Torben Weis 16


University Duisburg-Essen
Napster File Transfer

Mp3 Search
Client sends query to server
Server searches database
Server sends result set to client

Mp3 Download
1. Peer A sends query for song XY
2. Server sends address for
peer B to A
3. Client sends request to peer B
4. Download commences

Distributed Systems Torben Weis 17


University Duisburg-Essen
Napster File Transfer

MP3 download behind Firewall


1. Peer A sends query for song
2. Server tells connection data
of peer A to peer B
3. Peer B opens connection to
peer A
4. Peer A starts download from
peer B

Distributed Systems Torben Weis 18


University Duisburg-Essen
Napster - Conclusion

Pros:
Recent view on network due to central database
Support of MP3s only low risk on virus download
Cons:
Scalability: bottleneck server farm
Server poses Single Point of Failure
No security, file transfers not encrypted
Censorship of database possible (using filters)
Freerider problem
No chunking possible, download only from single
peer/file
high dependency on a single peer

Distributed Systems Torben Weis 19


University Duisburg-Essen
Napster - Summary

Napster pioneered peer-to-peer systems


Only small use of P2P technology
File download is p2p-based
File search is client-server-based
Protocol is closed-source
Reverse-engineering enabled development of
OpenNap
Napster got sued (4/2000), finally turned off
(7/2001)
Napster now acts as legal,
commercial music provider

Distributed Systems Torben Weis 20


University Duisburg-Essen
Gnutella - Development

History
Started after prohibition of Napster (1999-2001)
Justin Frankel (Nullsoft) publishes V0.4 in March 2000
Mother company AOL stops distribution
Already downloaded thousand-fold
Reverse-engineering revealed protocol

Goal:
Simple exchange of music in company network
No usage of central components

Distributed Systems Torben Weis 21


University Duisburg-Essen
Gnutella Properties

Properties:
Fully decentralized P2P network
Allows for download of all file types
No role allocation pure P2P
Each node is server and client Servent
Members are autonomous
Robust network, mainly 3-4 open connections

Problem:
Finding entry point (Bootstrapping)
Host-Cache Server
List with known hosts from former sessions

Distributed Systems Torben Weis 22


University Duisburg-Essen
Gnutella - Messages

Message ID
Distinct Identifier for messages in the network
Payload Descriptor
Ping, Pong, Query, Query Hit, Push
Time to Live (TTL)
Hops to go until packet is dropped, common value: TTL=7
Hops
Hops packet has already taken
Payload Length
Distributed Systems Torben Weis 23
University Duisburg-Essen
Gnutella Network Structure

No central index peers have to probe neighbors


Regularly broadcasting ping messages
Peer receives pong on same path it was sent, contains
Information about address: IP, Port, Servent ID
Amount and size of shared files
Loss of a node may lead to
network partitioning
Ping frequency vs.
Up-to-dateness
Ping size = 22 bytes
1000 peers/3 connections
each ~64 MB/sec

Distributed Systems Torben Weis 24


University Duisburg-Essen
Gnutella - Search

Broadcast query to neighbors


QueryHit contains servent ID, address, speed
Search runtime equals breadth-first search (O(|V|+|E|)
Search only limited through TTL
Client sends request

GET /get/4356/foo.mp3 HTTP/1.0


User-Agent: Gnutella
Connection: Keep-Alive
Range: bytes=0-

Distributed Systems Torben Weis 25


University Duisburg-Essen
Gnutella File Transfer
HTTP 200 OK
Download using HTTP Server: Gnutella
Content-type:
application/binary
Content-length: 3457827
Use of push messages to bypass firewalls
Common query ends with timeout
Client sends push message (ID + address) to server
Server opens connection to client

Download complicated if
Both peers behind Firewall
IP-Masquerading is used

Distributed Systems Torben Weis 26


University Duisburg-Essen
Gnutella Problems

Scalability
Massive traffic for keeping network up-to-date
Reliability
In dense networks packets drop after 3 hops
Long paths reduce success rate
Security
No use of hash values
Similar to DDoS-attacks
Privacy
Packets not encrypted

Distributed Systems Torben Weis 27


University Duisburg-Essen
Gnutella - Conclusion

Pros:
Very robust, connection init using TCP
Autonomous peers
Communication using UDP

Cons:
No guaranteed hits
Massive traffic and high latency
Massive scalability problems

Distributed Systems Torben Weis 28


University Duisburg-Essen
Chord DHT-based P2P

First 3rd Generation P2P system


Developed by Ion Stoica @ MIT in 2001
Scientific approach due to drawbacks of 2nd
generation
Complete decentralization while
Offering efficient and correct searches
Providing good scalability
Relying on flat network structure (no hierarchy)
Balancing load fairly on all nodes
First use of distributed hash tables in P2P

Distributed Systems Torben Weis 29


University Duisburg-Essen
Chord Use of DHTs

Cryptographic function SHA-1


160bit allow for addressing 2160 peers and objects
Collisions highly unlikely
SHA-1 guarantees major variation even on
minor changes
SHA-1(Franz)=b259d15d278969d8c6cc682bc5fb8c032a5a43de
SHA-1(Frank)=0df02da8548eeef2174c97c2ade67b4c5adc3160
Keys in key space are
Equally distributed
Avoid collisions
Distinct

Distributed Systems Torben Weis 30


University Duisburg-Essen
Chord Data Mapping
Data

Nodes

Example for 4 bit-space


f(x) = 3 * x mod 16
f(47) = 3 * 47 mod 16 = 141 mod 16 = 13
Distributed Systems Torben Weis 31
University Duisburg-Essen
Chord Search

Example
4 Nodes, 5 Object
Nodes responsible for
all keys between itself
and its predecessor

Search:
Nodes aware of both neighbors
Query direct neighbor
Runtime: O(n)

Distributed Systems Torben Weis 32


University Duisburg-Essen
Chord Finger Tables

Use of finger tables for abbreviations


N nodes, m entries
n=2m m=log2n
Fingerid[k] = first node on
circle that succeeds
(id+2k-1) mod 2m, 1k m
Successor = Finger [1]
Predecessor = previous node
// search the local table for the
highest predecessor of id
n:closest preceding node(id)
for i = m downto 1
if (finger[i] (n; id))
return finger[i];
return n;

Distributed Systems Torben Weis 33


University Duisburg-Essen
Chord Improved Search

Example: finger table for 22


i Address Node
1 22+20=23 26
2 22+21=24 26
3 22+22=26 26
4 22+23=30 30
5 22+24=38 39
6 22+25=54 55

22 searches 38
Query sent to node known to be closest lower than 38 30
30 sends query to successor asking for responsibility yes,
found data

Distributed Systems Torben Weis 34


University Duisburg-Essen
Chord Adding new nodes

New node q uses hash function to generate its


ID:=55
Search for this ID delivers successor(55):=56
Correction steps:
Predecessor of 56 (46) becomes
predecessor of 55
55 becomes predecessor of 56
55 becomes successor of 46
Copy finger table from 46 and
update all entries
All fingers from 46 have to check
their finger table too
Move data to new node if necessary

Distributed Systems Torben Weis 35


University Duisburg-Essen
Chord Node leaves network

Successor of 22 does not respond any more


Search next living finger (39)
Go backwards to last
functioning node
Last node becomes new
successor of 22

Distributed Systems Torben Weis 36


University Duisburg-Essen
Chord Node leaves network

Problem: several nodes leave concurrently


Successor(22) = ?
Going backwards ends in 39
as 37 is not reachable
Successor(22)=39
though L is alive
Data from L not accessible
Solution: successor list
Nodes store
r=O(log n) successors
22 knows of L and integrates it

Distributed Systems Torben Weis 37


University Duisburg-Essen
Chord - Conclusion

Pros:
Fully decentralized architecture
Equity among nodes, no role allocation
Improved scalability
Efficient and correct search methods: O(log n)
Cons:
Huge efforts to keep finger and neighbors up-to-date
Join and leave operations costly
No support for security, anonymity or firewalled users

Distributed Systems Torben Weis 38


University Duisburg-Essen
Summary

P2P has shown new ways in exchanging data


Fairness regarding disk space, bandwidth
Scalability allows for huge amounts of users
Improved robustness due to decentralization
Still P2P is mainly found in prototypes
Especially 3rd generation applications only in scientific areas
Popular applications (eDonkey) use 2nd generation protocols
Future work
Use of P2P technology in Vista (PNRP for distributed DNS
service)
OceanStore works on distributed data archives
Applications you build on your own

Distributed Systems Torben Weis 39


University Duisburg-Essen

You might also like