Measurement and Analysis of Bittorrent: A Thesis by Videsh Sadafal

MEASUREMENT AND ANALYSIS OF BITTORRENT
A Thesis
by
VIDESH SADAFAL
Submitted to the Office of Graduate Studies of

Texas A&M University
in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE
August 2008
Major Subject: Computer Science

MEASUREMENT AND ANALYSIS OF BITTORRENT
A Thesis
by
VIDESH SADAFAL
Submitted to the Office of Graduate Studies of

Texas A&M University
in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE
Approved by:
Chair of Committee, Dmitri Loguinov

Committee Members, Jennifer Welch
Pierce Cantrell
Head of Department, Valerie E. Taylor
August 2008
Major Subject: Computer Science

iii
ABSTRACT
Measurement and Analysis of BitTorrent. (August 2008)
Videsh Sadafal, B.En. National Institute of Technology Karnataka, India
Chair of Advisory Committee: Dr. Dmitri Loguinov
BitTorrent is assumed and predicted to be the world’s largest Peer to Peer (P2P)
network. Previous studies of the protocol mainly focus on its file sharing algorithm,
and many relevant aspects of the protocol remain untouched. In the thesis, we con-
duct a number of experiments to explore those untouched aspects. We implement a
BitTorrent crawler to collect data from trackers and peers, and statistically analyze
it to understand the characteristics and behaviors of the BitTorrent protocol better.
We find that the expected lifetime of a peer in the BitTorrent is 56.6 minutes and
the activity is diurnal. Peers show strong preference towards a limited number of
torrents, and 10% of torrents are responsible for 67% of traffic. The US contributes
maximum number of peers to the BitTorrent and µTorrent emerges as the favorite
BitTorrent client. We measure the strength of Distributed Denial of Service (DDoS)

attack using BitTorrent network and conclude that it is transient and weak. Finally
we address and discuss the content locatability problem in BitTorrent and propose
two solutions.
iv
To my parents
v
ACKNOWLEDGMENTS
I am sincerely grateful to Dr. Dmitri Loguinov for agreeing to guide my Master’s
thesis and for allowing me to do research with him. This work would not have been
possible without his constant motivation and guidance. Every bit of interaction with
him has been a learning experience in some way or the other. His attitude and
passion towards whatever he likes to do has always surprised and inspired me. I truly
acknowledge Dr. Jennifer Welch and Dr. Pierce Cantrell for being the members of
my advising committee. I also thank all the faculty members with whom I had a
chance to interact and get educated.

I appreciate the help and company of all the members of the Internet Research
Lab and friends in College Station. I am especially thankful to Xiaoming Wang for
supporting and guiding me whenever I needed his help.
Last, but not least, I am indebted to my parents and family members for all their
support and encouragement so far.

vi
TABLE OF CONTENTS
CHAPTER Page
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
B Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
C Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 5
II BITTORRENT: AN OVERVIEW . . . . . . . . . . . . . . . . . . . 6
A Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
B Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
III EXPERIMENTAL SETUP . . . . . . . . . . . . . . . . . . . . . . . 13

A BitTorrent Crawler . . . . . . . . . . . . . . . . . . . . . . . . . 13
1 Hash Collection . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Peer Collection . . . . . . . . . . . . . . . . . . . . . . . . 14
IV MEASUREMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A Seed Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 17
B Non-Seed Distribution . . . . . . . . . . . . . . . . . . . . . . . 18
C Population Distribution . . . . . . . . . . . . . . . . . . . . . . 19
D Popularity Distribution . . . . . . . . . . . . . . . . . . . . . . . 20
E Client Popularity Distribution . . . . . . . . . . . . . . . . . . . 23
F Geographic Distribution of Peers . . . . . . . . . . . . . . . . . 24
G Expected Lifetime of a Peer . . . . . . . . . . . . . . . . . . . . 26
1 RIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Weibull Distribution . . . . . . . . . . . . . . . . . . . . . 28
3 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . 28
H Distributed Denial of Service Attack . . . . . . . . . . . . . . . 30
I Activity in the Network . . . . . . . . . . . . . . . . . . . . . . 32
V IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . . . . . 34
VI A BETTER SEARCH INFRASTRUCTURE FOR BITTOR-

RENT NETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A The Current Situation . . . . . . . . . . . . . . . . . . . . . . . 39
vii
CHAPTER Page
B Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1 Unstructured Flood Based Approach . . . . . . . . . . . . 42
2 Structured DHT Based Approach . . . . . . . . . . . . . . 43
a Kademlia DHT . . . . . . . . . . . . . . . . . . . . 44
b BitTorrent . . . . . . . . . . . . . . . . . . . . . . . 45
VII CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
viii
LIST OF FIGURES
FIGURE Page
1 Formatted string for scrape request. . . . . . . . . . . . . . . . . . . 14
2 Formatted string for started announce request. . . . . . . . . . . . . 15
3 Formatted string for stopped announce request. . . . . . . . . . . . . 16
4 Seed distribution in BitTorrent network. . . . . . . . . . . . . . . . . 18
5 Non-seed distribution in BitTorrent network. . . . . . . . . . . . . . 19
6 Peer distribution in BitTorrent network. . . . . . . . . . . . . . . . . 21
7 Torrents contribution to the peer population. . . . . . . . . . . . . . 21
8 Popularity distribution in BitTorrent network based on the num-

ber of times a torrent is downloaded. . . . . . . . . . . . . . . . . . . 23
9 Usage distribution of top 17 clients. . . . . . . . . . . . . . . . . . . . 24
10 Geographic distribution of peers. . . . . . . . . . . . . . . . . . . . . 25
11 Complementary CDF (S0 = 3.75 × 105 , T = 7 days, 4 = 3 minutes). . 29
12 Intensity of attack with time. . . . . . . . . . . . . . . . . . . . . . . 30
13 Activity in the BitTorrent network. Experiment starts at 12:00

PM, Sunday, 4/13/2008. . . . . . . . . . . . . . . . . . . . . . . . . . 33
14 High level organization of hash collection program. . . . . . . . . . . 35
15 High level organization of peer collection program. . . . . . . . . . . 36
16 High level organization of residual lifetime tracking program. . . . . . 37

1
CHAPTER I
INTRODUCTION
With the increase in the connection speed of the Internet, online activities of users
have also increased. One of the activities which has become prevalent in the Internet
nowadays is files sharing. The Internet is the source of significant digital content.
This digital content is spread worldwide, hosted on servers, personal computers and
data centers. A basic and simple model for distributing files in the Internet is the
client server model, where the content is hosted on servers, and downloaded by the
clients. There are many known deficiencies of this model. The server can serve a
limited number of clients depending on its capacity, the download rate is very slow,
and the server is the central bottleneck for any failure. Multiple models have been
proposed and developed to counter these problems with different pros and cons. Peer
to Peer (P2P) sharing of the contents is one such solution that is distributed in nature
and eliminates the need of a central storage system.
P2P networks have been receiving increasing demand from users and are now
accepted as a standard way of distributing information because its architecture en-
ables scalability, efficiency and performance as key concepts. The P2P model is a
distributed model, and provides fault tolerance and robustness. The P2P applica-
tions use individual’s computing power and resources, instead of powerful centralized
servers and thus provide high content availability among peers. In P2P every node
acts as a server and a client. If a client has a piece of the file, it acts as a server
The journal model is IEEE/ACM Transactions on Networking.

2
for other clients. It exploits the connectivity and bandwidth among different partici-
pants, in which a peer simultaneously downloads different pieces of the same file from
many participants, which increases the downloading speed.
There are many versions of P2P systems, some of them are blend of the client-
server and distributed models and some are completely distributed. For example,
Napster, BitTorrent and OpenNAP fall in the first category, while Gnutella and
Freenet in the second. Keeping aside the illegal sharing of the copyrighted contents
and porn materials, P2P is a great breakthrough for sharing large files. Many Internet
applications are evolving around P2P technologies, and it has become a hot field
to research. Its use is not only limited to the sharing of the files, but it is also
used for distribution of live video and audio streams, email, and anonymous and
encrypted contents. Now the major portion of the Internet traffic is generated by
P2P applications. A study of internet traffic conducted by Cachelogic reveals that in

January 2006 P2P traffic accounted for 71% of all internet traffic, and 30% of that
is caused by BitTorrent. BitTorrent is assumed and predicted to be the largest file

sharing network in the world accounting for the large portion of internet traffic.
BitTorrent is a P2P protocol which provides effective solution for sharing and
distributing large files over the Internet. The protocol was designed by Bram Cohen
in April 2001, and it has now millions of users worldwide. Bittorrent has become
prevalent nowadays. The reason is its simple architecture and robust file sharing
algorithm that penalizes the selfish peers who only want to download without or
contributing very less to others, and provides fast download to honest peers.
3
A Objective
The thesis aims to advance research in Bittorrent and derive meaningful conclusions
by exploring the various aspects of the protocol. We collect and analyze real life data
statistically to understand the user and protocol behavior better. We also strive to
find the deficiencies or undesired behaviors in Bittorrent and propose modifications

to make the Bittorrent more powerful. Most of the previous studies [1], [2], [3] mainly
focus on Bittorrent file sharing algorithm to determine its efficiency, effectiveness and
fairness. But there are many other aspects of the protocol which are still untouched
and carry a great importance to characterize the protocol. We analyze seeds, non-
seeds, population and popularity distribution for torrents, the residual lifetimes of the
peers, their location distribution, usage distribution of the multiple Bittorrent clients
and peers’ activity pattern in the BitTorrent network. We also mount Distributed
Denial of Service (DDoS) attack [4] exploiting BitTorrent platform and measure its
strength and effectiveness.
Currently torrent files are hosted on the servers and peers are tracked by the
trackers. There are a large number of Bittorrent trackers spread across the world and
each has its own community of peers independent of each other. Nothing has been
done to collaborate among these trackers, so that they can provide more effective
solution of locating a file in the BitTorrent network. We propose some changes in

the existing Bittorrent protocol with focus on the content locatability that eliminate
the need of the servers to host the torrent files and integrate the trackers spread
worldwide. Our integration makes the search more effective and helps users to locate
the files in any Bittorrent network. A user need not to go to individual servers to
find a file and the integrated platform enables the user to submit a query through
any tracker and search the whole network.
4
B Contribution
We collect the statistical data for Bittorrent and shed light on some of the character-
istics of the protocol. Our contribution can be summarized in following points:
• We design and implement a BitTorrent crawler that crawls a list of trackers and
collects hashes for all the torrents managed by each tracker, and then for each
hash obtains a list of peers contributing in the swarm.
• We analyze the seed and non-seed distribution among torrents, which reveals
some interesting facts. The 48% of torrents have no seeds and 27% are without
any downloader. Just 10% of torrents are responsible for major population of
seeds and non-seeds in the network.
• Population distribution among torrents confirms that contribution of 90% of

torrents to the peer population in the network is just 33% and remaining 10%
torrents are responsible for 67% of the population
• We study the popularity of the torrents based on two metrics, number of times
a torrent is downloaded and number of peers participating in the swarm. We
see strong correlation between these two metrics.
• Geographic distribution of the peers at the country level reveals that maximum
population of the peers in BitTorrent network originates from US.
• On the BitTorrent client side, users show strong preference towards µTorrent.
The other popular clients are BitComet, Azureus and Bram’s BitTorrent.
• We use a simple renewal-process based Residual Lifetime Method proposed in
[5] to find the estimated lifetime of peers from the observed residuals. Our
experiment shows that the expected lifetime of a peer is 56.6 minutes.
5
• We exploit a feature in BitTorrent protocol to mount the DDoS attack targeting

a host in the Internet. The feature allows a peers to specify its address explicitly
when it communicates through a proxy or when it is on the same local side of
a NAT gateway as Tracker. We find that DDoS attack is not powerful in

BitTorrent and lasts for a short duration of time.
• We record the peers activity in the network and conclude that their behavior is
diurnal with users most active during noon and least active during midnight.
• Later we discuss on the problem of torrents locatability and propose two solu-
tions, unstructured flood based approach and DHT based approach.
C Thesis Organization
We take a quick tour of BitTorrent protocol in Chapter II. Chapter III introduces us
with the experimental setup and environment. We see the detailed study and analysis
of various statistical measurements in Chapter IV to get an insight into BitTorrent
protocol. Chapter V mentions implementation hurdles and high level organizations

of various program architectures. We discuss the torrent locatability problem and
propose some solutions in Chapter VI. Finally we conclude our thesis in Chapter VII.
6
CHAPTER II
BITTORRENT: AN OVERVIEW
In this chapter, we give the brief overview of the BitTorrent protocol [6], that is
helpful for the readers who are new to the BitTorrent and provides them the basic
foundation. We also explain some of features in detail, that we extensively used

in our implementation. The chapter is also important to understand the following
chapters comprehensively, since we make use of many terms defined here to describe
our implementations and measurements.
Downloading large files from a server is often slow because there is a single source
of the file. The Bittorrent protocol is a great breakthrough in the process of large size
file sharing and distribution. We start our explanation of Bittorrent by first defining
the main entities of the protocol as follows:
• Peer: It is responsible for sharing, downloading and uploading files in the net-
work. When a peer has the complete file, it becomes a Seed. A seed uploads a
file to others and does not download. A peer, who both downloads and uploads
the file pieces is called a Non-seed. A peer who only downloads and does not
upload the file to other peers is called a Leecher.
• Torrent: It is a file that contains the metadata information for a file to be
shared. Bittorrent divides the shared file into multiple small size pieces and
generates a torrent file, that contains an ”announce” section that specifies the
URL of the tracker who is going to assist the downloading of the torrent and
an ”info” section that contains the name of the file, its length, the piece length
used, hash identifier for the shared file and a SHA-1 hash code for each piece, all
7
of which is used by the clients to verify the integrity of the data they download.
• Tracker: It is a central unit which manages the swarms. A swarm is a group of

peers collaborating among themselves to download a torrent. Tracker keeps
a list of participating peers for each swarm with their downloading status
(started|stopped|completed).
When a user wants to share a file, it generates the torrent file and publishes it on
some web server or elsewhere, and registers with a tracker. Whenever a user wants to
download a file from the BitTorrent network, it first obtains the torrent file through
some web server and extracts the tracker’s url from the torrent file. It then contacts
the tracker to gather a list of participating peers in the swarm and starts downloading
different pieces of the file in rarest first fashion from those peers. Simultaneously
downloading multiple pieces from different peers not only makes the download fast
but also provides robustness against failure. Such a group of peers connected to each
other to share a torrent forms a swarm. While downloading different pieces, a peer
also uploads the downloaded pieces to other peers, thus it contributes to the swarm.
A Encoding
Now we describe encoding scheme and some message formats in Bittorrent that we
use in our implementation. Bencoding is an encoding scheme used in BitTorrent to
specify and organize data in concise format. The following types are supported in
Bittorrent:
• Byte string: it is encoded as follows: <string length>:<string data>
Example: 6:videsh
• Integer: it is delimited by ‘i’ and ‘e’ at the beginning and ending of the integer
8
respectively. The format is i<integer>e

Example: i256e
• List: it is delimited by ‘l’ and ‘e’ at the beginning and ending respectively. It
contains any bencoded type like integers, string, dictionary or another list. The
format is l<bencoded type>e
Example: li34e6:videshe
• Dictionary: it contains key-value pairs. Key must be bencoded string and value
may be any bencoded type including integer, string, list or another dictionary.
It is delimited by ‘d’ and ‘e’ at the beginning and ending. The format is
d<bencoded string><bencoded type>e
Example: d4:name6:videsh6:coursei689ee
B Message Format
The tracker is a HTTP/HTTPS service which responds to a HTTP GET request.
The Bittorrent protocol specifies two ways by which a peer can communicate with a
tracker. A peer needs tracker’s URL appended with either “announce” or “scrape”
to talk to the tracker. When the URL contains “announce”, we call it announce
URL and when it contains “scrape”, we call it scrape URL. Both of these URLs have
specific meanings, which we will describe shortly. Generally trackers do not publish
their scrape URLs, but only announce URLs. A scrape URL of a tracker can be
obtained by replacing “announce” with “scrape”.
1. Scrape: it is used by the peer to find the status (number of seeds, peers and
name of the shared file) for a particular torrent or for all torrents. For example:
http://example.com/announce -> http://example.com/scrape
9
http://example.com/x/announce -> http://example.com/x/scrape

http://example.com/announce.php -> http://example.com/scrape.php
To collect the status of a particular torrent, the scrape URL is suffixed by the
escape sequence of torrent’s hash identifier. For example:
http://example.com/scrape.php?info hash=aa...aa&info hash=bb...bb
Many trackers do not support scrape convention but if a tracker does, then it
responds to this HTTP GET request in a “text/plain” format or in gzip format
consisting of the bencoded dictionaries. If no info hash identifier is specified in
the message, then the tracker returns information for all the hashes it maintains.
The dictionary in a response contains following entries:
• files: this is a dictionary containing key/value pair for each torrent.
– Key: A 20 byte info hash.
– Value: It is the following dictionary
∗ complete: number of seeders.
∗ downloaded: number of times the tracker registered download

complete event.
∗ incomplete: number of downloading peers.
∗ name: torrent’s name as specified in info section of a .torrent file.
In the following example a torrent has 5 seeds, 10 downloading peers and it is
completely downloaded 50 times. “....” represents a 20 byte info hash.
d5:filesd20:....d8:completei5e10:downloadedi50e10:incompletei10eeee
2. announce: it is used to convey the status of a peer to the tracker and to obtain
the address list of the participating peers about particular torrent. Thus the
tracker keeps an updated statistics related to the torrent. The base announce
10
URL is of the form:

http://example.com/announce
http://example.com/x/announce
http://example.com/announce.php
The status parameters are added to the announce URL using standard CGI
methods. All binary data in the URL must be properly escaped. The following
parameters are used in the client request:
• info hash: 20 byte SHA1 hash value of the torrent.
• peer id: 20 byte unique client id.
• port: Port number the client is listening on.
• uploaded: Total number of bytes uploaded.
• downloaded: Total number of bytes downloaded.
• left: The number of bytes client still has to download.
• compact: It indicates if the client accepts compact response. In compact
response each peer is represented by 6 bytes, first 4 bytes represent ip and

next 2 port of the peer, all in network byte order.
• no peer id: It indicates if the peer wants the peer IDs also for other peers.
If compact is enabled, it is ignored.
• event: It can be following one of the three or empty. Empty indicates that
status will be sent at regular intervals.
– started: The first request to the tracker.
– stopped: When the client shuts down gracefully.
– completed: When the download completes.

11
• ip: ip address of the client machine.
• numwant: Number of peers a client would like to receive from tracker. By

default it is 50.
• key: It is an optional field, which is used by a client to prove its identity

in case its ip changes.
• trackerid: (optional). If the previous announce contained a trackerid it
should be included here.
Tracker response of the request consist of following fields in bencoded format:
• failure reason: It states the reason in case of failure in the human readable
format. No other field is present if it is.
• warning message: Similar to the failure reason but included other fields
also.
• interval: Interval between successive announce requests.
• min interval: (optional). If present the client must not re-announce more
frequent than this.
• tracker id: It should be sent back by client in its next announcement. If

absent and if the client used tracker id in its previous announcements the
same id should be used in subsequent announcements.
• complete: Number of seeds.
• incomplete: number of downloading peers.
• peers: It can be either in dictionary mode or compact mode. In compact
mode each peer is represented as described previously in the compact field.
In dictionary mode following keys are present:

12
– peer id: 20 byte unique id of the peer.
– ip: Peer’s IP address.
– port: Port number the peer is listening on.
Generally trackers randomly select peers to be included in the response, but it

is completely onto them to use any intelligent algorithm.
13
CHAPTER III
EXPERIMENTAL SETUP
Our experiments require collecting statistics about trackers and peers in Bittorrent
network. Peers are managed by the trackers, and trackers are distributed all over the
world, so our first task is to collect trackers’ announce URLs. We search through
Internet and collect the trackers addresses. In our efforts, we find a good link [7] that
contains a list of around 800 trackers and we include those also in our list of trackers.
But unfortunately all of the listed trackers are not accessible. Many of them are either
unresponsive or are private trackers who require an account and a passkey to access
them, so we are able to access and collect the statistics only from the public trackers.
There are around 35 public trackers, who reply to our requests. The next step in out
experiment requires a large collection of peers and torrents information. We query
trackers to find the hash identifiers of all the torrents they maintain, and then using
those hashes, we obtain a list of peers belonging to the swarm of each torrent. To
accomplice the mentioned tasks, we design and implement a BitTorrent Crawler and
then according to the requirement of each experiment tweak it a little to perform the
desired function.
A BitTorrent Crawler
Bittorrent crawler is a program, when given a list of the trackers crawls each tracker
and collects hash identifier and swarm information (seeds, non-seeds and number of
times the torrent is downloaded) for each torrent managed by a tracker, and then for
each hash collects the peers’ addresses (ip, port, id) participating in a swarm. The
14
GET /scrape.php HTTP/1.1\r\nConnection: close\r\n
Accept-Encoding: gzip\r\User-Agent: IRL-P2P-crawler\r\n

Host: torrent.unix-ag.uni-kl.de:6969\r\n\r\n
Fig. 1. Formatted string for scrape request.
crawler consists of two logical parts:
1 Hash Collection
We use the scrape convention described in the previous chapter to collect all the
hashes managed by the trackers. Each hash corresponds to a unique torrent. We
take the announce URL of a tracker, replace announce keyword with scrape and send
the HTTP GET request to the tracker. For example, a typical scrape request for
announce URL http://torrent.unix-ag.uni-kl.de:6969/announce.php looks as given in
Figure 1, where we accept gzip encoding and advertise the user agent as IRL-P2P-
crawler. Abbreviation IRL represents Internet Research Laboratory in Texas A&M
university.
2 Peer Collection
We use announce method to obtain a list of peers in the swarm for each torrent.
We make use of the hash identifiers collected in the previous step. When we send an
announce request with event=started to a tracker, we get a list of peers but at the same
time the tracker also registers our address and distributes it to the other peers in the
swarm. The side effect of this appears in the form of inbound connections from other
peers who try to contact us in search of the pieces they need. To avoid this undesirable
situation, we send an announce request with event=stopped immediately after getting

15
GET /announce.php?info hash=....&peer id=....&port=6882&uploaded=0&
downloaded=0&left=100000000&event=started&compact=1&numwant=10000&
no peer id=1 HTTP/1.1\r\nConnection: keep-alive\r\n
Accept-Encoding: gzip\r\nUser-Agent: IRL-P2P-crawler\r\n

Host:torrent.unix-ag.uni-kl.de:6969\r\n\r\n
Fig. 2. Formatted string for started announce request.
a response from the tracker for the previous announce request. On the receipt, the
tracker deregisters us from the swarm, that prevents spread of our address to other
peers. For example, the formatted string of the started announce HTTP message for
announce URL http://torrent.unix-ag.uni-kl.de:6969/announce.php appears as shown
in Figure 2.
In Figure 2, “....” represents a 20 byte escaped hex sequence of the formatted
string for info hash and peer id. Fields uploaded and downloaded are initialized as
0. An important parameter of the request is left and it must be initialized with
some large value, otherwise setting it to 0 makes the tracker consider us as a seed,
which may cause unnecessary security concerns and tracker authority may assume
us sharing the illegal copyrighted contents. To make the peer address representation
in the response small as explained in the previous chapter, we set compact=1. Field
numwant is set to some large value so that we can collect as many peers from a
swarm as possible, but a tracker ignores it if it is more than 50. When we don’t
need peer id, we set no peer id=1. Since we need to send the stopped announce
request immediately after response, we keep the current TCP connection open with
Connection:keep-alive and send the stopped message in the same connection. If a
tracker closes the TCP connection after responding to started announce request, we
16
GET /announce.php?info hash=....&peer id=....&port=6882&uploaded=0&
downloaded=0&left=100000000&event=stopped&compact=1&numwant=10000&
no peer id=1 HTTP/1.1\r\nConnection: close\r\n
Accept-Encoding: gzip\r\nUser-Agent: IRL-P2P-crawler\r\n

Host:torrent.unix-ag.uni-kl.de:6969\r\n\r\n
Fig. 3. Formatted string for stopped announce request.
open a new TCP connection to send stopped announce request. The formatted string
of the stopped announce HTTP message for the same announce URL appears as
shown in Figure 3.
Initially we implement the hash and peer collection together in the crawler. Dur-
ing our experiments we realize that a fresh list of peers are required each time an
experiment is run, but hashes remain same across experiments, so we separate the
hash collection part from peer collection. The hash collection is required only once at
the beginning of all experiments and the subsequent runs use the same hashes. Also it
speeds up our experiments, since hash collection requires decoding a large bencoded
file for each tracker that takes long time and we need not to wait for decoding of the
bencoded files. We collect around 1 million hashes for 35 trackers. Since experiments
require fresh list of peers each time they are run; the number of hashes we query and
peers we collect, vary across experiments depending on the availability of the trackers
and network condition at the time of the run.

17
CHAPTER IV
MEASUREMENTS
In this chapter, we conduct a series of experiments, collect the data and analyze
it statistically. We obtain some interesting results about BitTorrent, which we will
see shortly. These results help us to understand the behavior of the protocol and
performance in real life situation.
A Seed Distribution
We start our analysis with the seeds distribution among the torrents. A seed is an
important part of the swarm and makes sure that at least one complete copy of the
shared file exists in the network.

We collect number of seeds available for all the torrents in a tracker using our
BitTorrent Crawler. We collect the number of seeds, non-seeds and times a torrent is
downloaded information for around 254,000 torrents. The same data we will be using
to analyze non-seed, population and popularity distributions in the coming sections.
Figure 4(a) shows the CDF distribution of seeds on torrent population in BitTorrent
network, that is exponential in nature. In Figure 4(b), we draw the frequency of
torrents verses their ranks on log-log scale. A torrent with the n seeds is provided a
rank of (n + 1), thus a torrent with no seed is given rank 1, with 1 seed rank 2 and
so on. We draw this Figure till rank 120, after that power fit is not possible because
there are no torrents for some of the ranks. Analyzing this data reveals following
results:
1. Since there is no seed for a large fraction of the population, the fraction can not
18
1 1000000
0.9 100000
Number of torrents
Measured
% of Torrents
0.8 10000 Power fit
0.7 1000
0.6 100
0.5 10
0.4 1
1 10 100 1000 1 10 100 1000
Number of seeds in a torrent Rank of a torrent
(a) CDF of seeds on torrents. (b) Rank-wise distribution of tor-

rents.
Fig. 4. Seed distribution in BitTorrent network.
guarantee that there exists a complete copy of the shared item and may not be
useful. This fraction is 48% of the total population.
2. Even 90% of the torrent population has less than 6 seeds, so a large population
of seeds comes from just 10% of the torrents.
3. Close resemblance of the torrents distribution to the power fit on log-log scale
in Figure 4(b) proves that the seed distribution closely follows Zipf distribution.
B Non-Seed Distribution
In this experiment, we do similar analysis as in the previous section but this time with
the non-seeds. The experiment is useful to know how the non-seeds are distributed
among torrents in BitTorrent network. CDF distribution of non-seeds on torrents in
Figure 5(a) exhibits the exponential distribution. In Figure 5(b), we provide ranks to
all the torrents based on the number of non-seeds they have and draw the frequency
distribution for top 240 ranks. A torrent with n non-seeds is given a rank of n + 1.
19
1 1000000
100000
Number of torrents
0.8 Measured
% of Torrents
10000 Power fit
0.6 1000
100
0.4
10
0.2 1
1 10 100 1000 10000 1 10 100 1000
Number of non-seeds in a torrent Rank of a torrent
(a) CDF of non-seeds on torrents. (b) Rank-wise distribution of tor-

rents.
Fig. 5. Non-seed distribution in BitTorrent network.
We find that
1. There is no downloader for 27% of torrents. Probably the fraction is an indicator

of useless material in BitTorrent, either because there is no seed and download
can not be completed, or the material is a mal-content.
2. Torrents with less than 6 downloaders constitute 83% of torrents population. It
indicates that most of the downloaders are interested in just 17% of the torrents.
3. The rank-wise distribution of torrents in Figure 5(b) closely follows power fit,
that implies the Zipf distribution.
C Population Distribution
Now we determine how the population is distributed over the BitTorrent network.
In this distribution, we consider both seeds and non-seeds in a torrent to analyze
the population distribution among torrents. There are two important aspects of this
analysis, first to find how the peers’ density is distributed over torrent population,
20
and second how much is the contribution of those torrents to the total peer population
in the network. Figure 6(a) displays the CDF distribution of peers on torrents. In
Figure 6(b), we analyze the rank-wise frequency distribution of torrents with Zipf
distribution, providing a rank of n + 1 to a torrent with n peers. The collected
statistics shows that
1. There is neither a seed nor a non-seed for 7% of the torrents, so the fraction is
a garbage.
2. The distribution of population is very sparse for a large fraction of torrents. This
uneven distribution is visible from the fact that around 80% of the torrents have
less than 8 peers and small fraction of the torrents is highly dense.
3. Only 10% of the torrents have more than 15 peers, which indicates that a large
population of peers comes just from 10% of the torrents.
4. Figure 6(b) exhibits that rank wise frequency distribution of torrents closely
follows the power fit, and that implies the Zipf distribution.
Figure 7 shows the relationship between the torrent population and their contri-
bution to the peer population. Around 7% of the torrents do not contribute anything.
The Figure concludes that contribution of the 90% of torrents to the peer population
in the network is just 33% and remaining 10% torrents are responsible for 67% of the
population. If we assume the download rate to be constant among all the peers, than
we can conclude that 10% of the torrents cause 67% of the Bittorrent traffic in the
Internet.
D Popularity Distribution
We use two metrics to measure popularity of the torrents

21
1
1000000
0.8 100000
Number of torrents
% of Torrents
Measured
0.6 10000
Power fit
1000
0.4
100
0.2
10
0 1
1 10 100 1000 10000 1 10 100 1000
Number of peers in a torrent Rank of a torrent
(a) CDF of peers on torrents. (b) Rank-wise distribution of tor-

rents.
Fig. 6. Peer distribution in BitTorrent network.
0.8
Peers population
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
Torrents population
Fig. 7. Torrents contribution to the peer population.

22
1. Total number of peers in a torrent: This is the same metric used to analyze
population distribtution. The number of peers in a torrent swarm is also an
indicator of the torrent’s popularity. The collected data shows that users are
not at all interested in 7% torrents. Around 80% of the torrents have less than 8
peers. Torrents with more than 15 peers constitute only 10% of the population,
so the users are most interested in mere 10% torrents.
2. Times a torrent is downloaded: The above metric does not take history of a
torrent into account, but the times a torrent is downloaded is a good indicator
of the torrent popularity till now including its past. Figure 8(a) shows the CDF
distribution of torrents based on the number of times it is downloaded. The

Figure reveals that 47% of torrents are not even downloaded once. Around
80% of the torrents are downloaded less than 6 times. Only 10% of torrents are
downloaded more than 15 times. Thus very small population of torrents is really
popular among downloaders. Figure 8(b) shows the frequency distribution of
the torrents on the increasing order of their ranks. A torrent that is downloaded
n times is given a rank of n + 1. The distribution closely follows the power fit
and confirms the Zipf distribution.
Both the metrics follow Zipf distribution and exhibit linear correlation between
them that indicates that either peer distribution or times-downloaded distribution
can be used to obtain the popularity distribution for a tracker. The obtained results
also conclude that even if the BitTorrent network seems so large sharing huge number
of files, but peers are interested in very small fraction of the files contributing major
portion of the traffic.

23
1 1000000
0.9 100000
Number of torrents
% of Torrents
0.8 10000 Measured
Power fit
0.7 1000
0.6 100
0.5 10
0.4 1
1 10 100 1000 10000 1 10 100 1000
Times a torrent is downloaded Rank of a torrent
(a) CDF of download frequency (b) Rank-wise distribution of tor-

on torrents. rents.
Fig. 8. Popularity distribution in BitTorrent network based on the number of times a

torrent is downloaded.
E Client Popularity Distribution
There are more than 75 BitTorrent clients, some are well known but many not even
heard of. We take the study to find the users preference for different BitTorrent
clients. Each peer in BitTorrent network is recognized by its ID, which contains the
client ID, client version and a random number. We extract client ID from the peer
ID to find which client a peer is using. There are two ways, we can collect client
ID’s of the peers; 1) directly from a tracker setting compact=0 and no peer id=0,
so that a tracker returns a peer ID also with its address, 2) We use BitTorrent
crawler to get the peers currently participating in the BitTorrent network, and then
we contact every peer to find its client ID. Problem with the first approach is that
many trackers do not follow the convention and even after asking for peer id, they
return the response in concise format without peer id, so we stick to the second
approach in our implementation. From the collected 220,000 client IDs, we draw
the distribution for top 17 clients in use as shown in Figure 9 and find that 95% of
24
60
49.36
50
40
% population
30
19.83
20
12.88
9.26
10
1.33 1.29 1.01 0.92 0.88 0.46 0.45 0.43 0.40 0.31 0.27 0.23 0.14
0
Mldonkey
µTorrent
Bram's BitTorrent
libTorrent
Xtorrent
Xunlei
Transmission
unknown
BitTornado
Shareaza
Azureus
BitComet
BitSpirit
FlashGet
Lphant
Ktorrent
Ares
Clients
Fig. 9. Usage distribution of top 17 clients.
the users use only four BitTorrent clients, µTorrent, BitComet, Azureus and Bram’s
BitTorrent (it is the BitTorrent client devised by the inventor of BitTorrent ”Bram
Cohen”). Among them µTorrent emerges as a clear winner claiming 49.36% of the
BitTorrent clients. BitComet follows next with 19.83% clients. Azureus claims 12.88%
and Bram’s BitTorrent 9.26% clients.
µTorrent is a freeware written in C++, that is designed to use minimal computer
resources while offering all the major functionality including Mainline DHT. BitComet
is also a C++ written windows client, while Azureus is written in Java. Bram’s
BitTorrent is the first BitTorrent client, that revolutionized file sharing.
F Geographic Distribution of Peers
Geographic disperse of the peers is useful to know the location preference of the users.
To see the geographic spread of the peers, we first obtain unique IP addresses of the
peers using BitTorrent crawler and then we identify their locations from their IP
addresses.
25
0.2
0.16
Population
0.12
0.08
0.04
Malaysia
Mexico
Switzerland
Norway
Brazil
Sweden
Czech
Slovenia
Germany
Italy
US
UK
EU
Israel
Canada
China
France
Taiwan
Spain
Australia
Japan
Poland
Romania
Greece
Finland
Chile
India
Netherlands
Hungary
Philippines
Others
Portugal
Singapore
Argentina
Bulgaria
Thailand
HongKong
Fig. 10. Geographic distribution of peers.
To find the location of a peer, we use the WHOIS database. WHOIS is a TCP-
based query/response protocol which is widely used for querying a WHOIS database
in order to determine the owner of a domain name, an IP address, or an autonomous

system number on the Internet. One WHOIS server stores the WHOIS information
from all the registrars for the particular set of data. When we try to retrieve in-
formation related to a particular IP address, it searches the database to find which

domain and ASN the IP address belongs to and returns its information. But WHOIS
database is good to obtain location information of IP addresses, only when the gran-
ularity is high, i.e. country level. We use a WHOIS utility called NetCat to perform
the mentioned task. NetCat takes a file containing IP addresses and retrieves the
information from WHOIS server.
We collect close to 700,000 unique IP addresses and their country level distri-
bution in Figure 10 shows that US has dominance over geographic preference of the
26
peers with 18% peers population. Most of the European countries contribute almost
equally with slightly higher proportion contributed by UK with 7%. In Asia, al-
though China emerges as the main participant, but its share in the BitTorrent user
population is around 6% less than half as compared to US.
G Expected Lifetime of a Peer
Expected lifetime of a peer in the P2P network is an important property to char-

acterize the network. Peer lifetimes are important for understanding general user
behaviors, their habits and application performance offered by the peers in the sys-
tem. Many existing methods use Create Based Method (CBM) [8], which has many
known deficiencies. It divides a given observation window into two halves and sam-
ples users every 4 time units until they die or the observation window ends. A small
observation window or large 4 may lead to highly inaccurate lifetime distribution in
CBM. A better accuracy can be achieved with a small 4, but it introduces significant
overhead in the method. Thus the sampling method exhibits an inherent tradeoff be-
tween overhead and accuracy. Further the approach causes bias in the measurement
because of two factors; inconsistent round-offs that is introduced by rounding up the
lifetimes for some users and rounding down for others, and missed users who arrive
and depart between 4 interval.
Papers [5], [9] and [10] create a novel analytical framework to understand and
characterize bias in the network sampling. They show that any sampling method
that directly attempts to measure user lifetime every 4 time units is biased as long
as 4 > 0 and the bias can not be removed regardless of the mathematical manipu-
lation applied to the measured samples. To overcome these limitations of the direct
sampling methods [5] proposes a simple renewal-process based ResIDual-based Esti-

27
mator (RIDE) to produce an unbiased version of the residual distribution H(x). The
method requires taking a snapshot of the entire network and tracing residual lifetime
of each peer found in the snapshot until it dies or the observation window expires.
The lifetime distribution F (x) can be obtained from sampled residuals H(x) using a
simple mechanism based on renewal churn model of [9], [10] with a negligible amount
of error.
1 RIDE
At time 0 RIDE takes the snapshot of the whole system and records all the alive users
n found in the snapshot in a set S0 . It probes all the peers in S0 for the subsequent
interval j of 4 time units either until they die or the observation window T expires.
RIDE defines two important properties: 1) no valid samples can be missed since only
users who are alive at time t = 0 are valid measurements; 2) no samples can be
inconsistently rounded off since all valid residual lifetimes start from the time of the
first crawl.
The measured distribution can be used to obtain an unbiased estimator of the
actual residual distribution:
N (xj )
P (R(t) ≤ xj ) = lim , (1)
|S0 |→∞ |S0 |
where N (xj ) denotes the number of samples in S0 with measured residual lifetime
smaller than or equal to time xj (x = j4). At any time t À 0, the distribution of the
remaining lifetime R(t) of users present in the system is given by [10]:
1Zx
H(x) = P (R(t) ≤ x) = (1 − F (u))du, (2)
µ 0
where µ = E[L] is the expected lifetime of a joining peer and system size n is suffi-
ciently large. For residual lifetime sampling the following is an unbiased estimator of
28
lifetime L:
h(xj )
ER (xj ) = 1 − , (3)
h(0)
where h(xj ) is the Probability Density Function (PDF) of R(t). Since H(x) is ob-
tained without bias, it is possible to numerically compute its derivative h(x) using
Taylor expansion with error bounded by O(4k /k!), where k = T /4 is the number of
samples.
2 Weibull Distribution
The Weibull distribution is a continuous probability distribution and is often used in
data analysis. It defines the CDF as follows:
k
P (X < x) = 1 − exp−(x/λ) (4)
where k > 0 is the shape parameter and λ > 0 is the scale parameter of the distribu-
tion. The expected mean of the distribution is give as:
1
E(X) = λΓ(1 + ) (5)
k
where Γ represents the Gamma function.
3 Observation
In BitTorrent network, we first obtain all the hashes for the torrents from trackers,
and then for each hash, we collect the peers. Since a tracker is a centralized entity, it
does not entertain many connections from a peer and limits the number of connections
a peer can establish at a time. It causes an elongated snapshot period, which may
potentially harm and introduce bias in our measurement. To avoid this, we start
probing each peer independently as soon as it is available. Further all the peers
29
1 1
0.1
0.1
1-CDF
1-CDF
0.01
Data Weibull
0.01 Weibull Data
0.001
0.001 0.0001
1 10 100 1000 10000 1 10 100 1000 10000
R L
(a) k = 0.4612 and λ = 218.21 (b) k = 0.3618 and λ = 12.587
Fig. 11. Complementary CDF (S0 = 3.75 × 105 , T = 7 days, 4 = 3 minutes).
collected from the trackers may not be alive or reachable, and since RIDE restricts
the observation only for the alive peers from the snapshot, we include only those peers
into set S0 who reply to our first probe message.
We collect around one million unique peers from 250,000 hashes, but only 375,000
of them are reachable and constitute our initial set S0 . We observe the peers for one
week (T = 7 days) and probe each peer every 4 = 3 minutes to see if it is alive.
Figure 11(a) shows complementary CDF of the residual lifetime distribution. It is
clear from the Figure that the distribution closely follows the Weibull distribution with
k = 0.4612 and λ = 218.21. We use two point derivative method [5] to obtain expected
lifetime distribution of the peers using Equation 3 as shown in Figure 11(b). The
resulting complementary CDF curve follows the Weibull distribution with k = 0.3618
and λ = 12.587, that gives expected lifetime of 56.6 minutes. Our results confirm the
finding of [11] that the session length distribution is neither Poisson nor Pareto and
is more accurately modeled by a Weibull distribution. For 4 = 3 and T = 7days, the
resulting error comes to 4.95 × 10−8789 , that can be considered 0 for all the practical
30
140
Average connections / interval

120
100
80
60
40
20
0
0 100 200 300
Interval
Fig. 12. Intensity of attack with time.
purposes.
H Distributed Denial of Service Attack
When a peer wants to download a torrent, it announces to the tracker of its presence.
The tracker registers peer address and returns a random list of maximum 50 peers
participating in the torrent swarm. The peer then contacts other peers in the swarm
to download the torrent. Since its address is registered in the tracker, it is distributed
to other peers as a result of their update announce request. As a result the peer starts
receiving connections from the other peers.
The BitTorrent protocol allows a peer to specify its IP address and port number
explicitly in the announce request to the tracker when the peer communicates through
the proxy. This feature is also necessary when both the client and the tracker are on
the same local side of a NAT gateway. The BitTorrent protocol does not enforce any
security check to verify that the announced IP is same as that of the requester. We
exploit this feature to attack a host in the Internet with the connection requests from
the BitTorrent network.

31
In this experiment, we measure the strength of Distributed Denial of Service

attack using the BitTorrent P2P network. We use two systems connected to the
internet, one as an attacker and other as a target. We publish the target’s address
from the attacker for all the hashes collected during hash collection step of BitTorrent
crawler and at the same time measure the number of connections being received at
the target. We publish the target for 198,000 hashes in 45 minutes and monitor the
number of incoming connections at target for 150 minutes. Figure 12 shows average
number of connections per interval, where each interval is a window of 30 seconds.
We measure the incoming connections for 150 minutes at the target. We observe
maximum 164 inbound connections per second. Averaging it over every 30 seconds
window gives maximum 130.59 average connections in an interval as shown in the
Figure 12. For the observation period of 150 minutes, average connections per second
comes to 70.36. We also observe that the attack diminishes with the time. The attack
proves to be quite weak. The possible reasons are
1. Once a peer fail to connect to the target, it does not retry to connect.
2. Tracker removes a peer address from the list of available peers, if it does not
receive the updated announce request from the peer within certain time from
the previous announcement. The time is usually set 1.5 or 2 times of announce
interval and the announce interval is generally 30 minutes. We see a decline
in the attack intensity in the Figure 12 at around and after interval 120 (3600
seconds), that is the result of tracker deregistering our announced target. We
see the sharp decline after 1 hour, that eases down later. The phenomena can be
explained here. We have many small trackers and some large trackers and since
we start advertising target to all the trackers together with the same intensity,
small trackers finish fast and our attack continues with only some large trackers.
32
The result is high intensity of attack at the beginning that comes down with the
time and obviously the same effect is observed when trackers start deregistering
our target. We advertise the target address for all the hashes only once and
then watch its effect with time without renewing the attack.
3. The strength of the attack is weaken by the fact that tracker is a central entity
and does allow a limited number of connections from a peer. If we would have
published all the hashes in 3 minutes rather than 45 minutes, then it would
have caused much powerful attack, but again for a short period of time due to
the above two reasons.
4. We have a small collection of 35 public trackers who respond to our request and
sometimes not all of them are responsive at the same time. Collecting a huge
number of trackers and then mounting attack will prove rather powerful.
I Activity in the Network
Peers do not remain online always, they join the network to download a file and then
leave it. We take the study to find at what time users prefer in a day to download
a file, and how is the arrival pattern of the users. To record the peer activity in the
BitTorrent network, we select a tracker with a large user base and monitor it for 7
days recording user activity every hour in terms of total torrents and active downloads
using scrape method. Statistical analysis of collected data shows that Activity in the
BitTorrent network exhibits diurnal behavior and the users are most active around
noon.
We identify a large tracker a.tracker.thepiratebay.org for this task and monitor it
for one week. We start our experiment at 12:00 PM on 4/13/2008 and take snapshot
of the network every hour. Figure 13(a) shows users upload behavior in terms of
33
0.48 3.5
Downloads (million)
0.46
Torrents (million)
0.44
0.42 2.5
0 24 48 72 96 120 144 0 24 48 72 96 120 144
Hours Hours
(a) Users upload behavior. (b) Users download behavior.
Fig. 13. Activity in the BitTorrent network. Experiment starts at 12:00 PM, Sunday,
4/13/2008.
number of torrents in the network. We observe that in a day users upload most
torrents around noon and are least active during midnight. We also observe growth
in tracker network, and total torrents at the end of our experiments (one week) are
around 465,000 up 20 thousands from the start of experiments, so the tracker grows
4.5% during our experiment.
Figure 13(b) captures users download behavior. It plots number of active down-
loads every hour for 7 days. We observe similar diurnal behavior as in previous graph
with users most active around noon and least active during midnight. We start our ex-
periment on Sunday 12:00 PM, which registers relatively higher number of downloads
compared to that of weekdays and it again goes high during next weekend. Probably
users prefer downloading during weekends when they are free then weekdays.
34
CHAPTER V
IMPLEMENTATION
Initially we decide to use open source BitTorrent client Arctic to implement Bit-
Torrent Crawler. Arctic is a simple implementation of BitTorrent client in C++ in
windows platform with a simple GUI. We modify the source to use only the required
portion that communicates with a tracker. But it proves very slow for our experi-
ments. Arctic provides the thread based implementation where communication with
each tracker is done separately in a thread. Each thread combines the hash collection
and peer collection parts, which is not desired to perform variety of experiments for
the same reason we explained in Chapter III. Since decoding is a very computation
intensive operation and takes up the whole CPU, performing peer collection at the
same time makes many requests to get timed out, that further worsens the experiment
time. Issues faced during open source modifications lead us to write our own imple-
mentation from scratch. We write the code in C++ and make use of the open source
zlib library for decompressing of data from the tracker. We separate out the hash
and peer collection parts. We keep the hash collection part fairly simple, where we
generate limited number of threads who contact the trackers, decode the hashes and
store them for future use. Figure 14 shows the high level structure of hash collection
program.
In the peer collection part, since we need to communicate with many trackers
at the same time collecting peers for many hashes, we first experiment with simple
thread based approach where we open one separate thread for each tracker and then
each thread opens many connections with the assigned tracker to collect peers. Man-
35
Trackers
Thread Thread Thread

1 2 n
Decode Decode Decode
Hashes
Fig. 14. High level organization of hash collection program.
aging many connections at a time can be accomplished with non-blocking sockets.

The problem with this approach is that opening more than one thread per processor
does not provide any efficiency improvement and non-blocking socket based imple-
mentation suffers from the serious drawback of polling.
The solution comes in the form of Input-Output Completion Port (IOCP), that
provides an scalable solution to manage large number of connections in a windows

machine, while still keeping the threads overhead at the low. IOCP is an API for
performing multiple simultaneous asynchronous input/output operations. An IOCP
port can be created and associated with a number of sockets or file handles. When
I/O service is requested on the object, completion is indicated by a message queued
to the I/O completion port. A process requesting I/O services is not notified of
completion of the I/O services, but instead checks the I/O completion port’s message
queue to determine the status of its I/O requests.
Figure 15 shows the high level organization of peer collection program. We use
two threads in our program. Tracker thread creates IOCP based sockets and generates
asynchronous connections to the trackers to discover new peers. If connection to a

36
Connect Sockets Send + read

trackers announce
Tracker
thread IOCP
worker
thread
Peers
list
Unique
Peers Peer filter
peers
Fig. 15. High level organization of peer collection program.
tracker is successful, IOCP worker thread sends announce request for the torrent
hash collected during hash collection phase to the tracker. On the receipt of the
response from the tracker the worker thread filters already seen peers and puts the
newly discovered peers in some file for latter processing. Although IOCP based
implementation resolves issues with managing large number of connections efficiently,
yet we can not open unlimited connections with a tracker. Tracker is a central entity
and limits number of connections from a peer. To speed up the peer collection, we
open 50000 connections simultaneously with a tracker, but it does no better than
opening 200 connections at a time. Most of the connections get timed out and very
few are successful, so we limit the number of connections with a tracker to 200.
The trickiest part of our experiments is to track every peer for its residual lifetime.
As explained in the previous chapter, since our initial snapshot takes a long time, to
mitigate its effect from the measurement, as soon as we discover a new peer we poke
it to see if it is alive and if yes, we put it into our sample set, where they are tracked
every 4 interval. Figure 16 explains the high level organization of the program for
discovering new peers and tracking them for their residual lifetime, that we obtain
by modifying our peer collection program. We introduce one more thread peer thread
37
Sockets Sockets
Connect Send + read Disconnect Connect

trackers announce peers peers
Tracker IOCP Peer

thread worker Alive thread
thread peers Heap
Dead Peers
peers list
Residual New
Peer filter Queue
lifetime record peers
Fig. 16. High level organization of residual lifetime tracking program.
that tracks discovered peers for their residual lifetime. Here on the receipt of the
response from the tracker the worker thread filters already seen peers and puts the
newly discovered peers in the queue. Peer thread takes the new peers from the queue,
creates IOCP based sockets and issues asynchronous connections to them. Connection
indication is received by the worker thread and if successful then the peer is considered
in our sample set and put on the heap, otherwise discarded. The peer thread tracks
each peer on the heap every 4 interval of its previous poke time until the peer dies or
the observation time expires. When the connection to a peer in our sample set fails,
the peer is considered dead and worker thread registers its residual lifetime.
38
CHAPTER VI
A BETTER SEARCH INFRASTRUCTURE FOR

BITTORRENT NETWORKS
Bittorrent is one of the most successful Peer to Peer (P2P) protocols. Its success lies
in its simplicity of organizing the peers sharing the same file together, so that they can
find and help each other in downloading different pieces of the files simultaneously,
and its algorithm which discourages free riders and rewards the uploaders. Although
there is no doubt about the popularity of Bittorrent, but one thing which remains
cumbersome and problematic in Bittorrent is how to locate the torrent of a particular

file. A user can take advantage of Bittorrent only when it has torrent for the shared
file, but where to obtain that torrent from? This is a big question and a big roadblock
on the road to Bittorrent becoming almighty from the mighty.
The current implementation of Bittorrent requires you to get the torrent for the
shared file from some external source before downloading can be initiated. Generally
a user who wants to share some file in Bittorrent network generates the torrent for the
shared file and uploads it to some web server. Whoever wants to download the file gets
the torrent file from the web server and then starts downloading the shared file. But
locating the server which contains the desired torrent is just a hit and trial approach
and sometimes it can take long time; sometimes the users are so frustrated that they
simply choose to give up. The web is huge and sometimes finding a particular shared
file proves a gigantic task if not impossible.
In this chapter, we focus on the issue of torrent locatability and propose some
solutions for the problem. We think that integration of search functionality of isolated
39
trackers will resolve the issue and consider different approaches to achieve this. We
explore the pros and cons of each solution, and compare their practicability, feasibility
and effectiveness to solve the problem. We give here the brief description of the
approaches we consider and will discuss each approach in detail in coming sections.
We propose two solutions to search the torrent file in the Bittorrent network
1. Unstructured Flood based approach: The approach is inspired by the method
used in Gnutella network [12] to find a particular file, which floods the network
with the query and gets the distributed response from multiple nodes who share
the same file.
2. Structured Distributed Hash Table (DHT) based approach: The approach is
inspired by the DHT based approach adopted in the Kademlia P2P network
[13], [14], [15], which stores the object to the node determined by the hash
obtained from the key of the object and uses the same hash to retrieve the
object back.
A The Current Situation
In Bittorrent, trackers manage downloading of shared files represented by the torrents
and torrents are hosted by web servers. Generally a tracker has its own separate web
server for hosting the torrent files, but it is not mandatory. The web servers provide
users the search utility to find the torrents hosted in their servers. In the present
scenario, a tracker and its torrent server are completely isolated entities and there
is no cooperation or integration between them. Even there is no cooperation among

trackers. Similarly all the torrent servers also operate in isolation. The web servers
provide users the search utility to find the torrents hosted in their servers. If a user
wishes to download a file, he first searches for the torrent visiting different servers
40
and only after finding the torrent, starts downloading it. But if the user does not
know the URL of the web server which hosts the torrent of the file, he simply can
not download the file, and Considering the vastness of the web and popularity of
BitTorrent this seems to be most often the case. Although there are some BitTorrent
search sites which claim to provide efficient search results in BitTorrent network, but
they only send the queries to some renowned torrent servers and display the results
returned, yet there exist thousands of such torrent servers which are not covered. In
present scenario, there does not exist an elegant solution that integrates these isolated
torrent servers and provides an efficient and effective search with high accuracy. We
take up the challenge to explore the various alternatives of the existing approach and
propose two solutions in the coming sections. These solutions though are subjected
to a detailed analysis and verification.
The existing system is also vulnerable to the failure of the torrent server. If the
server goes down, even if the tracker is up and running, users can not locate the
torrents and thus can not join the tracker and start downloading files. We also find
that there is no need of a server at all for hosting the torrent files, what the user
needs is just the info hash of the file, which is a 20 byte SHA-1 identifier of the file
and the tracker address who manages this file. Once the info hash is available, the
user can get a list of the peers involved in downloading the file by contacting the
tracker that manages its swarm, and with the slight modification in the BitTorrent
protocol, we can make the new peer to download torrent file directly from one of
the downloading peers. This will completely eliminate the need of the torrent server
and bring down the cost of storage and maintenance. In our proposal, we assume
that each has two logical components; 1) a search utility to find the relevant torrents
in the BitTorrent network and 2) a traditional management utility that maintains

the swarms information. In our explanation, our reference to tracker includes search
41
utility also until we explicitly say the other way.
B Solution
Our proposal is that if we integrate search utilities of the isolated trackers spread
worldwide, we can provide better infrastructure for locating and sharing files. All
the trackers are connected in a network and the managed torrents from a tracker is
made available to other trackers for the purpose of search. When a user wants to
search a shared file, he enters the query to the BitTorrent client. The client sends the
query to any tracker it is connected to, the tracker forwards the query to the search
utility, which then retrieves the information from the network and sends the result
back to the client, which is then displayed to the user. Each result contains the meta
info of the file like its name, category, info hash of the file and the announce URL
of the tracker who manages this file. When the user clicks on a result to download
the file, the client sends the announce request for the info hash of the torrent to the
specified tracker in the result and the tracker replies with a random list of currently
downloading peers and seeds. The client then retrieves the torrent file from one of
the peers and starts participating in the swarm.
The first concern that arises with all this explanation is how the trackers are
going to form the network. We rule out the presence of a central authority, which
helps the trackers to get the addresses of other trackers. The central authority makes
the design week and presents a single point of failure. On the contrary we adopt a
novel approach, in which the trackers learn the other trackers’ addresses from the
peers who join its network. A peer downloads the file from multiple trackers in its
lifetime, if each peer just stores the tracker’s address from which it downloaded the
last file and provides this information to the current tracker, then this information is
42
more than sufficient for the tracker to build the knowledge base about other trackers.
The tracker selects some entries from its knowledge base and connects to them to join
the network of trackers.
Now the question is how the search utilities of the trackers are going to interact,
such that they provide the efficient management and retrieval of information in the
network of trackers. We examine two approaches to achieve this; first is unstructured
flood based approach and second structured DHT based approach. We will explore
each approach in details one by one.
1 Unstructured Flood Based Approach
This approach is inspired by the unstructured P2P network Gnutella, which floods
the user query in the network and gets a distributed response from different peers.
The querying peer forwards the query to all the neighboring nodes it is connected
to. Each neighboring node then forwards the query to all the nodes it is connected
to, and thus the query is propagated in the network until the number of hops visited
reaches the Time To Live (TTL) value. The query is generated with some TTL which
determines its life as the number of hops, after which the query is discarded. This is a
very simple approach and it is very easy to implement. But flooding is not considered
a good idea, because of the huge cost of communication and it does not scale well with
the size of the network. To reduce the amount of traffic and provide more stability to
the network, Gnutella is divided into two kinds of peers, Ultra peers and Leaf peers.
Ultra peers are more stable peers and participate in the routing of the messages. An
Ultra peer is connected to other Ultra peers and Leaf peers. Leaf peers connect to
the Ultra peers. Since the TTL limits the life of the query, the search of the file in
the network is unreliable. We can never be sure that a query will cover all the hops
in the network, so there is no guarantee that if some file exists in the network then it
43
will be found. The same query sometimes returns results and sometimes fails.
We can connect the trackers in a Gnutella like network, where trackers play the
role of Ultra peers and normal peers the role of leaves. As stated before, trackers can
easily collect the addresses of other trackers from peers and then connect to the k
randomly chosen trackers. Thus the trackers will form a network, where each node has
the degree k. Each tracker maintains an index of torrents that it maintains. When a
peer requests a search, the query is sent to the trackers it knows. The trackers search
the requested item in their index list and send the result back to the peer. At the
same time they also forward the query with ip and port number of the requesting
peer to all of its neighbor trackers, who carry out the same activity on receiving the
query request. To limit the lifetime of the request we use the TTL field and populate
it with the value which ensures a greater reach to the network.
Why should we consider the flooding approach when we know the obvious short-
comings of the approach? Definitely, with the increasing size of the network, the
traffic becomes more severe in the network, but we should not forget that we are
connecting trackers in Gnutella like network not the peers and number of trackers
in the world are very few compared to number of the peers. Also the growth in the
tracker network will be very slow. So the approach is worth considering even after
its shortcomings because of the simplicity and robustness. Though one problem still
remains unsolved is the accuracy of the result, since even if the network provides
greater reach it does not ensure the complete reach.
2 Structured DHT Based Approach
This approach is inspired by the DHT based P2P protocols. We studied several DHT
based P2P protocols, Chord, Tapestry and Kademlia, and realized that concepts of
Kademlia DHT can be used to form trackers network. First we explain how Kademlia
44
[13] works and then we construct DHT based trackers network.
a Kademlia DHT
Kademlia is a Distributed Hash Table (DHT) based P2P network protocol, which
uses the XOR metric to route the messages and locate the nodes and objects. A
Kademlia node takes advantage of every message that it receives in updating the
information about network and exploits this information to tolerate node failures
by sending parallel and asynchronous query messages. Each node in the network is
recognized by a 160 bit ID which is the SHA-1 hash. An object is identified by a
(Key,Value) pair. To store an object in the network 160 bit hash is obtained from
its key and then the (Key, Value) pair is stored in the K nodes closest to the hash
obtained from the key. To route a message to the nearest node to a given ID, XOR
metric is used, which has unidirectional property, i.e. all lookups for the same key
converge along the same path irrespective of their originating nodes. Each node in
Kadelmia maintains a routing table with the number of entries equal to the number
of bits in its ID. Each entry in the table is called a K bucket, since it stores a list of K
(IP, UDP Port, Node ID) triples, where each triple has the same prefix as the Node’s
ID till the i’th bit. Triples in a k-bucket are kept in the sorted order by the time last
seen. A new node is inserted in the k-bucket only when it is not full or the old nodes
leave the system, thus it prevents the denial of service attack in which the network is
flooded with malicious nodes. When the look up request for a key comes to a node, it
first obtains the hash for the key and then finds the K nearest nodes to the hash from
its K buckets. The node then selects the α nearest nodes from the selected K, and
forwards the search request to these α nodes, each receiving node then returns the
K nearest nodes to the hash from their K buckets. After receiving the response from
the α nodes, the node again finds the α nearest nodes for the hash and repeats the
45
same procedure until the nearest existing node to the hash in the network is found.
To determine the nearest node XOR metric is used. XOR of the node ID from the
given ID gives the distance between node ID and given ID.
When a node wants to join the network, it must have knowledge of at least one
participating node. The node then selects a random ID for self and then performs its
own lookup against the node it knows. In the process, it populates its own K buckets
and those of the traversing nodes.

Kademlia is used in file sharing networks. Filename searches in the Kademlia
network are implemented using keywords. By making Kademlia keyword searches,
one can find information in the file-sharing network so that it can be downloaded.
Since there is no central instance to store an index of existing files, this task is divided
evenly among all clients. The filename is divided into its constituent words. Each of
these keywords is hashed and stored in the network, together with the correspond-
ing meta-information of the file. A search involves choosing one of the keywords,
contacting the node with an ID closest to that keyword hash, and retrieving the
meta-information of the file that contain the keyword.
b BitTorrent
We connect the BitTorrent trackers using Kademlia protocol with some changes which
we will explain as we move on in our description. Each tracker in the network plays
the role of a node and each torrent the role of an object.
A tracker in the network is recognized by 160 bit SHA-1 obtained from its IP.
Each tracker hosts many torrents and each torrent is recognized by the metadata
which includes its name, category and some other attributes. Analogous to Kademlia,
in the trackers network, a torrent is an object and it is published in the network by
(key, value) pair by the tracker, where value consists of info hash, name, category and
46
hosting tracker. Key is a keyword present in its metadata by which the object can
be searched. An object may be associated with multiple keys if its metadata includes
more than one keyword and it is published in the network for each key by the tracker.
A peer can submit the search query for a torrent to a tracker. When a search query
arrives to a tracker, it retrieves the keywords from the search string and calculates
hashes for each key. The tracker then performs lookup for each hash in the trackers
network, sorts the results in the order of their relevance and returns the results to
the peer.
When a tracker first time joins the network, it publishes all the torrents it has,
so that they can be accessed by other trackers also. At later point of time when
the tracker leaves the network and then rejoins it, it just needs to refresh the torrent
objects in the network that it manages. When a new torrent is submitted to a tracker,
it is also published in the network.
A tracker refreshes all the torrent objects it stores periodically. The refresh
period affects the periodic traffic in the network, so it should not be kept small. Also
it should not be very large; otherwise it will affect the consistency of lookups in the
network because nodes responsible for the keys may leave the network. If a node does
not receive refresh message for a (key, value) pair within twice the refresh period, it
removes the pair from its list. The refresh period we suggested is tentative and may
be tuned to give better result and performance. This arrangement is required to get
rid of stale information. When a node leaves the network, objects published by it
become stale with the time they are removed from the network.
We need to determine the replication factor K also. Replication is needed to cop
with the churn in the network. A node is responsible for the keys which are close
to its ID. When a node leaves the network, lookups for the keys it handles are not
affected because of the replication in the network. Our intuition is that we can keep
47
the K small, since we are connecting trackers in the Kademlia network, who are lot
more stable than the average peers.
We want to maintain the robustness of the system against failures, so a tracker
should be functional even if it is cut off from the rest of trackers network. To achieve
this, a tracker not only publishes the torrents it maintains, but also keeps a complete
index of the torrents in its local BitTorrent network. In the presence of such extreme
failure when a tracker is isolated, at least it will be able to satisfy the search requests
for its own BitTorrent network. To further optimize the computation time for sorting
the results and amount of traffic in the network, when an object lookup is performed
for a key, the tracker sends all the keys found in the search string with the key
lookup, so that the node responsible for handling the key retrieves the objects for
the key based on the relevance of the objects with all the keys present in the search
string. The node then sorts the result according to their relevance and sends the
result directly to the querying tracker. It helps the querying node in sorting and
arranging the results in the order of their relevance.
The changes suggested by us do not hinder or interfere with the existing func-
tionality of BitTorrent protocol. They will only change the way torrents are located,
managed and downloaded. Even in the extreme case when all the trackers are iso-
lated, the system will continue working as it works now, only the users connected to
a tracker will not be able to search the torrents of other trackers.

48
CHAPTER VII
CONCLUSION
We conduct a series of experiments and find the interesting results for BitTorrent.
These results give us an idea of how the protocol performs and peers behave in real
world. We find that a large portion of the torrents population is not usable, since
there is no peer at all. Also we find that 67% of the traffic is contributed by just
10% of the torrents. Expected lifetime of a peer is 56.6 minutes in the network.
Maximum peers originate from the US and µTorrent is the most favorite BitTorrent
client. We also exploit a feature in BitTorrent to mount DDoS attack, though it does
not prove powerful. Later we address the content locatibility issue in BitTorrent and
propose two solutions. Our proposal opens a new field of research and is subject to
improvement and verification.
We see immense potential in BitTorrent. Its use is not only limited to sharing the
files, but it is showing its strength in other fields also, Video on Demand is one such
field. We are hopeful to see BitTorrent as the most powerful file sharing platform,
spreading its wings across other applications.

49
REFERENCES
[1] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding and X. Zhang, “Measurements,

analysis, and modeling of BitTorrent systems,” College of William and Mary,
Tech Report, WM-CS-2005-08, July 2005.
[2] D. Qiu and R. Srikant, “Modeling and performance analysis of BitTorrent-like
peer-to-peer networks,” in Proc. of ACM SIGCOMM, August 2004, pp. 367-378.
[3] A. R. Bharambe and C. Herley, “Analyzing and Improving BitTorrent Perfor-

mance,” Microsoft Corporation, Tech Report, MSR-TR-2005-03, Feb. 2005.
[4] N. Naoumov and K.W. Ross, “Exploiting P2P Systems for DDoS Attacks,” In-
ternational Workshop on Peer-to-Peer Information Management, 2006, vol. 152,

article 47.
[5] X. Wang, Z. Yao and D. Loguinov, “Residual-Based Measurement of Peer and
Link Lifetimes in Gnutella Networks,” IEEE INFOCOM, May 2007, pp. 391-399.
[6] Wikipedia, “Bittorrent Protocol Specification v1.0,”
http://wiki.theory.org/BitTorrentSpecification.
[7] D3F, “The Offical Huge Tracker List,” http://forums.phoenixlabs.org/t85-the-

official-huge-f-tracker-list.html, Phoenix Labs Forum, Sep 2005.
[8] D. Roselli, J. R. Lorch and T. E. Anderson, “A Comparison of File System
Workloads,” in Proc. USENIX Annual Technical Conference, Jun. 2000, pp. 41-
54.
50
[9] D. Leonard, V. Rai and D. Loguinov, “On Lifetime-Based Node Failure and
Stochastic Resilience of Decentralized Peer-to-Peer Networks,” in Proc. ACM
SIGMETRICS, Jun. 2005, pp. 26-37.
[10] Z. Yao, D. Leonard, X. Wang and D. Loguinov, “Modeling Heterogeneous User

Churn and Local Resilience of Unstructured P2P Networks,” in Proc. IEEE
ICNP, Nov. 2006, pp. 32-41.
[11] D. Stutzbach and R. Rejaie, “Understanding Churn in Peer-to-Peer Networks,”
in Proc. ACM IMC, Oct. 2006, pp. 189-202.
[12] D. Stutzbach, R. Rejaie and S. Sen, “Characterizing Unstructured Overlay

Topologies in Modern P2P File-Sharing Systems,” IEEE/ACM Trans. Network-
ing, April 2008, pp. 267-280.
[13] P. Maymounkov and D. Mazieres, “Kademlia: A Peer-to-peer Information Sys-

tem Based on the XOR Metric,” IPTPS, March 2002, vol. 2429, pp. 53-65.
[14] J. Falkner, M. Piatek, J. P. John, A. Krishnamurthy and T. Anderson, “Profiling
a million user DHT,” ACM IMC, 2007, pp. 129-134.
[15] I. Baumgart and S. Mies, “S/Kademlia: A Practicable Approach Towards Secure
Key-Based Routing,” ICPADS, Dec. 2007, pp. 1-8.

51
VITA
Videsh Sadafal received his Bachelor of Engineering degree in Information Tech-
nology in India from National Institute of Technology, Karnataka in May 2004. He
received his Master of Science degree in Computer Science from Texas A&M Univer-
sity in August 2008. His research interests include P2P networks, distributed and
large-scale systems, and computer networks. He can be reached at:
Videsh Sadafal
538, Trimurti Nagar, DamohNaka
Jabalpur, M.P.
India, 482002
The typist for this thesis was Videsh Sadafal.

Measurement and Analysis of Bittorrent: A Thesis by Videsh Sadafal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measurement and Analysis of Bittorrent: A Thesis by Videsh Sadafal

Uploaded by

Copyright:

Available Formats

MEASUREMENT AND ANALYSIS OF BITTORRENT

Submitted to the Office of Graduate Studies of

Major Subject: Computer Science

Submitted to the Office of Graduate Studies of

Chair of Committee, Dmitri Loguinov

Major Subject: Computer Science

Measurement and Analysis of BitTorrent. (August 2008)

Videsh Sadafal, B.En. National Institute of Technology Karnataka, India

Chair of Advisory Committee: Dr. Dmitri Loguinov

BitTorrent client. We measure the strength of Distributed Denial of Service (DDoS)

I am sincerely grateful to Dr. Dmitri Loguinov for agreeing to guide my Master’s

chance to interact and get educated.

support and encouragement so far.

III EXPERIMENTAL SETUP . . . . . . . . . . . . . . . . . . . . . . . 13

VI A BETTER SEARCH INFRASTRUCTURE FOR BITTOR-

1 Formatted string for scrape request. . . . . . . . . . . . . . . . . . . 14

2 Formatted string for started announce request. . . . . . . . . . . . . 15

3 Formatted string for stopped announce request. . . . . . . . . . . . . 16

4 Seed distribution in BitTorrent network. . . . . . . . . . . . . . . . . 18

5 Non-seed distribution in BitTorrent network. . . . . . . . . . . . . . 19

6 Peer distribution in BitTorrent network. . . . . . . . . . . . . . . . . 21

7 Torrents contribution to the peer population. . . . . . . . . . . . . . 21

8 Popularity distribution in BitTorrent network based on the num-

9 Usage distribution of top 17 clients. . . . . . . . . . . . . . . . . . . . 24

10 Geographic distribution of peers. . . . . . . . . . . . . . . . . . . . . 25

11 Complementary CDF (S0 = 3.75 × 105 , T = 7 days, 4 = 3 minutes). . 29

12 Intensity of attack with time. . . . . . . . . . . . . . . . . . . . . . . 30

13 Activity in the BitTorrent network. Experiment starts at 12:00

14 High level organization of hash collection program. . . . . . . . . . . 35

15 High level organization of peer collection program. . . . . . . . . . . 36

16 High level organization of residual lifetime tracking program. . . . . . 37

The journal model is IEEE/ACM Transactions on Networking.

many participants, which increases the downloading speed.

P2P applications. A study of internet traffic conducted by Cachelogic reveals that in

is caused by BitTorrent. BitTorrent is assumed and predicted to be the largest file

find the deficiencies or undesired behaviors in Bittorrent and propose modifications

solution of locating a file in the BitTorrent network. We propose some changes in

istics of the protocol. Our contribution can be summarized in following points:

seeds and non-seeds in the network.

• Population distribution among torrents confirms that contribution of 90% of

torrents are responsible for 67% of the population

population of the peers in BitTorrent network originates from US.

• We use a simple renewal-process based Residual Lifetime Method proposed in

• We exploit a feature in BitTorrent protocol to mount the DDoS attack targeting

a NAT gateway as Tracker. We find that DDoS attack is not powerful in

tions, unstructured flood based approach and DHT based approach.

of various statistical measurements in Chapter IV to get an insight into BitTorrent

protocol. Chapter V mentions implementation hurdles and high level organizations

foundation. We also explain some of features in detail, that we extensively used

the main entities of the protocol as follows:

• Torrent: It is a file that contains the metadata information for a file to be

• Tracker: It is a central unit which manages the swarms. A swarm is a group of

use in our implementation. Bencoding is an encoding scheme used in BitTorrent to

• Byte string: it is encoded as follows: <string length>:<string data>

respectively. The format is i<integer>e

The tracker is a HTTP/HTTPS service which responds to a HTTP GET request.

http://example.com/x/announce -> http://example.com/x/scrape

escape sequence of torrent’s hash identifier. For example:

http://example.com/scrape.php?info hash=aa...aa&info hash=bb...bb

responds to this HTTP GET request in a “text/plain” format or in gzip format

consisting of the bencoded dictionaries. If no info hash identifier is specified in

The dictionary in a response contains following entries:

• files: this is a dictionary containing key/value pair for each torrent.