Professional Documents
Culture Documents
A Comparison of Web Privacy Protection Techniques: Johan Mazel Richard Garnier Kensuke Fukuda
A Comparison of Web Privacy Protection Techniques: Johan Mazel Richard Garnier Kensuke Fukuda
Abstract—Tracking is pervasive on the web. Third party users regarding their interests. To this end, advertisers leverage
trackers acquire user data through information leak from context (e.g. visited website) or previous browsing interests.
websites, and user browsing history using cookies and device Advertisers use techniques such as cookies to identify users
fingerprinting. In response, several privacy protection techniques
across websites and build their browsing history. Other tech-
arXiv:1712.06850v1 [cs.CR] 19 Dec 2017
uBlock Origin [17] GPL v3 3,759 1.6 otherwise, these extensions are open source.
Ghostery [18] Proprietary 966 5.4.10
Disconnect [19] GPL v3 204 3.15.3 1) Extensions: We classify extensions regarding used third
NoTrace [20] - 0.8 2.4 party tracking impediment methods: blocking lists, heuristics,
DoNotTrackMe/Blur [21] Prop. 86 6.0.2091 indiscriminate blocking, or other.
BeefTaco [22] Apa. 2.0 10 1.3.7.1
The following browser extensions use blocking lists made
Ind. Heur.
Privacy Badger [23] GPL v3 205 1.0.6 of regex-based rules on domain names. These lists are usually
MyTrackingChoices [22] - 0.1 1.0
community maintained. Ghostery [18] is a proprietary, privacy-
NoScript [24] GPL v2 1,765 2.9
RPC [25] GPL v3 6 1.0
focused extension that uses a specific tracker blocking list.
It has recently been bought by the privacy-focused browser
HTTPSEverywhere [26] GPL v2 353 5.2.5
Other
Decentraleyes [27] MPL 2.0 69 1.2.2 Cliqz [30]. Ghostery requires a configuration step to select
WebOfTrust [28] GPL v3 315 - categories of trackers to block. uBlock Origin [17] is a general
purpose blocker. It can also understand the syntax used by
TABLE I the famous ad-blocker AdBlock Plus. uBlock Origin includes
P RIVACY PROTECTION TECHNIQUE CHARACTERISTICS . H EUR .: by default: EasyList, EasyPrivacy, Peter Lowe’s Adservers,
HEURISTICS ; I ND .: I NDISCRIMINATE ; P OPULARITY (K): POPULARITY IN
THOUSANDS .
Malware domains and some specific lists. Disconnect [19] is
another blocker. Blur [21] (also known as Do Not Track Me)
is a proprietary extension owned by Abine. It blocks trackers,
protects mail addresses and passwords. NoTrace [20] uses a
blocked resource overlap between privacy protection tech- wide range of techniques in order to enhance privacy on the
niques, and compare their performances. Third, we assess the Internet. NoTrace needs to be configured after installation.
impact of privacy protection techniques on website quality. Fi- One can also block tracking by setting opt-out cookies that
nally, we provide recommendations for end-users and scientific block domains from setting standard cookies. Several entities
community. [31], [32] provide opt-cookie lists. BeefTaco [33] creates
opt-out cookies in the browser. Another privacy protection
II. BACKGROUND
approach uses ad-blockers to hinder tracking in the same way
This section presents an overview of third party web track- they block advertisements. AdBlock Plus [16] is the most
ing and existing privacy protection techniques. We refer the popular ad blocker. Using specific lists, it can block trackers,
reader to [14], [15] for a more complete description of these social widgets, or malwares. Since 2011, AdBlock Plus is
aspects. commercially exploited by Eyeo which monetizes domain
whitelisting [34]. In addition to the ad-focused EasyList [35]
A. Third party web tracking that is used by default in AdBlock Plus, there is a privacy-
Websites massively rely on advertisement to monetize user focused list called EasyPrivacy [35].
visit. Advertisers purchase ads for their products directly from Unlike previous extensions relying on blocking list, some
publishers or ad exchanges. When users access websites, use heuristics to block trackers. Privacy Badger [23] is devel-
they communicate with all these actors. From the users’ oped by the Electronic Frontier Foundation and its behavior
viewpoint, these entities belong to two categories: first and is further described in § V-B. By default, Privacy Badger
third parties. First parties are the entities that users intend uses the Do Not Track HTTP header [5] and strips the
to reach, here, publishers. Third parties can be advertisers, referrer field in HTTP requests. MyTrackingChoices [22] is
ad exchanges or others actors that provide services to first an advertisement friendly privacy protection extension. It uses
parties (such as web analytics). Using techniques such as an hybrid approach which leverages both heuristics similar to
cookies, local data storage or fingerprinting, third parties can Privacy Badger and blocking list for bootstrapping. Users can
identify users across websites. Combining HTTP referrer field allow tracking for some website categories.
and user identification, they can reconstruct users’ browsing Some extensions indiscriminately block resources NoScript
history. Using cookie syncing, trackers can also exchange user [24] disables JavaScript. As a side effect, this also disables
identification and thus improve data collection. some tracking. RequestPolicy Continued [25] blocks all third
party requests.
B. Privacy protection techniques Finally, some extensions use other mechanisms to pro-
Several techniques have been designed to protect users tect users’ privacy. HTTPSEverywhere [26] tries to replace
from third party tracking. Network-based techniques use DNS HTTP connections with HTTPS ones if HTTPS is supported.
filtering or proxy. They however exhibit several shortcomings: WebOfTrust (WOT) [28] provides website rating regarding
proxies cannot analyze encrypted traffic while DNS filter- trustworthiness and child safety. It was temporarly removed
ing only blocks entire domains [12]. User agent spoofing from extensions stores when it was revealed that WOT broke
browser extensions may also improve privacy by hindering privacy rules of browser developers [36]. Decentraleyes uses
fingerprinting, but exhibit poor performance in practice [29]. local files to emulate resources, e.g. freely available JavaScript
2
libraries, hosted on centralized entities. This blocks tracking lists (see Table I). Some proposals automatically build tracking
from these entities. Table I provides popularity (measured as blocking lists using specific keys in URL that correspond to
the number of users for Firefox), source code license and user identifying data [52], machine learning on DOM structure
version of the extensions used in our work. [68], Javascript [53], [55] or user browsing behavior [69]. The
2) Browsers: Some browsers also have built-in privacy same machine-learning-based approach was also applied to ad-
protection features. DoNotTrack [5] is a field in the HTTP blocking list building using network traffic features [51].
header that asks the destination not to track the sender. It was
proposed in 2009 and is currently under standardization in C. Privacy protection techniques comparison
W3C. This feature is turned off by default in Firefox. The While some extensions are used in almost all studies (e.g
last proposed amendments to the European Union’s ePrivacy AdBlock Plus [16] and Ghostery [18]), many of them are
Regulation hints at an increase interest of policy makers seldom employed. Similarly, some works compare between
towards DoNotTrack [37]. Firefox [38] can block cookies: four and seven extensions [7], [8], [9], [10], [11], [12], [13]
either all or only third parties’ ones or only third party cookies while another focuses on two [6]. Table II describes how
from previously visited domains. A tracking protection [39] existing web privacy-related work used or analyzed existing
has also been added in Firefox version 42. Brave [40] and privacy protection techniques.
Cliqz [41] are browsers that emphasize their privacy protection Features used to compare privacy protection techniques are
features. very diverse and thus complicate result comparison across
work. Some works compare privacy protection techniques in
III. R ELATED WORK terms of HTTP request number [7], [9], [10], [12], cookie
A. Tracking on the Internet number [7], [8], [9], [10], domain number [9], [11], [13],
The seminal contribution in the field of web privacy was by private information diffusion [6], and occurrence of behav-
Krishnamurthy et al. They first studied the diffusion of privacy ioral advertising [8]. Some counter-measure proposals are
information [1], then, analyzed the impact of counter-measures also comparing their approaches to others regarding tracker
on this diffusion and webpage quality [6], and observed the blocking [49], [50], [39] or impact on browser performance
increase of user-data aggregation by a small number of entities [22]. Table III lists metrics and features leveraged by existing
[58]. Several works then tried to analyze this phenomenon web privacy-related studies.
with more details by classifying tracking [59] or analyzing Our work is close to [6], [7], [8], [9], [10], [11], [12], [13]
geographical variations of tracking [43], [44]. but improves the state of the art along four axes: protection
While early studies focused on cookie-based tracking, two techniques, target websites, metrics, and reliability. First,
main new tracking techniques trends emerged: resilient in- we compare more protection techniques (15 and additional
browser data storage [3] and browser fingerprinting [4]. Local combination of blocking lists), than Krishnamurthy et al. [6]
data storage allows one to store data in multiple locations (2), Mayer et al. [7] (4), Balebako et al. [8] (4), Hill [9],
(for example, Flash Local Storage Object, HTML5 or ETags [10] (4), Wills et al. [11] (6), Merzdovnik et al. [12] (5)
among other means) on a device to bypass standard cookie and Traverso et al. [13] (7). It is difficult to compare the
removal. This technique thus provides resilient local identifier extensions covered by our work with Krishnamurthy et al. [6]
storage. By using browser fingerprint, a tracker can completely since the ecosystem was much simpler at the time (AdBlock
avoid using and storing an identifier inside the browser and Plus and NoScript were the only extensions available). We
instead recognize a browser, or the device it is installed did not use some extensions that were addressed in previous
on, using its characteristics such as fonts [60], battery [61], studies because they are either, not supported anymore (e.g.
canvas [62] or hardware level features [63]. Several studies TACO which was owned by Abine [8], [56]), or not available
then tried to detect fingerprinting [64], or both local storage for Firefox (AdBlock [46], [22], [11], Superblock Adblocker
and fingerprinting [2], [48]. Furthermore, Lerner et al. [65] [22], Adremover [22] and Adblock Pro [22]). Furthermore,
show that local storage fingerprinting increased between 1996 AdBlock (resp Firefox Tracking Protection [39]) uses the same
and 2016. Another privacy threat is the practice of sharing blocking list as AdBlock Plus (resp. Disconnect). By analyzing
user identification across tracking entities. This is called ID- AdBlock Plus and Disconnect, we thus provide performance
sharing, cookie-syncing or cookie-matching. While the phe- bounds on these two techniques. Achara et al. [22] and Wills
nomenon has been documented for some time [66], recent et al. [11] also evaluated the performance of an ad-blocking
work analyzed its prevalence in the wild and its impact on extension that we do not address in this study: AdGuard
user browsing history sharing among trackers [2], [67], [48]. Adblocker. Second, we us more websites than most existing
Finally, the feasibility of user surveillance through bulk work. We use the Alexa Top 1000 while Mayer et al. [7],
passive network traffic monitoring-based cookie observation Hill [9], [10] and Traverso et al. [13] use the Alexa Top 500,
has also been analyzed [47]. 45, 84 and 100 URLs, and 100 italian URLs. Balebako et al.
discussed five browsing scenarios that use a small number of
B. Automated blocking list building websites to assess the occurrence of behavioral advertising,
Many privacy protection tools have been proposed (see we thus cannot compare the websites we use. We crawled a
§ II-B). Most of these tools use manually maintained blocking number of websites similar to Wills et al. [11] but analyzed
3
Type Authors & references Extensions Browsers
DoNotTrackMe/Blur [21]
MyTrackingChoices [22]
HTTPSEverywhere [26]
Privacy Badger [23]
Decentraleyes [27]
uBlockOrigin [17]
WebOfTrust [28]
Disconnect [19]
Ghostery [18]
NoScript [24]
AdBlock [42]
NoTrace [20]
Firefox [39]
RPC [25]
DNT [5]
Cliqz
Krishnamurthy et al [1]
Castelluccia et al [43]
Web Fruchter et al [44]
privacy Carrascosa et al [45] +
analysis Metwalley et al [46] · · · ·
Englehardt et al [47] + + + +
Englehardt et al [48]
+
+ + +
Malandrino et al [49] + + + + +
Privacy Malandrino et al [50] + + + + +
protection Kontaxis et al [39] + +
Achara et al [22] + + + + +
Gugelmann et al [51] +? +?
Tracker Metwalley et al [52]
detection Wu et al [53] ?
Yu et al [54] + + +
Ikram et al [55] + + + + +
User Leon et al [56] + + + +
ana. Malandrino et al [57] +
Krishnamurthy et al [6] + +
Mayer et al [7] + + + +
Comp. Balebako et al [8] + + + +
Hill [9] + + + +
Hill [10] + + + +
Wills et al [11] + + + + + +
Merzdovnik et al [12] + + + + +
Traverso et al [13] +
+
+ +
+ + +
Our work - ∼ + + + + + + + + + + + + + + + + ∼
TABLE II
E XTENSIONS USED IN PREVIOUS PUBLICATIONS (· FOR USAGE IN THE WILD ,
FOR GROUND - TRUTH , FOR GROUND - TRUTH AND USAGE , + FOR
COMPARISON , ⊕ FOR COMPARISON AND GROUND - TRUTH , ? FOR ML BOOTSTRAPPING , ∼ MEANS THAT WE DID NOT EVALUATE THESE TECHNIQUES BUT
THAT THEIR RESULTS CAN BE DERIVED FROM OUR WORK : A D B LOCK USES THE SAME DEFAULT BLOCKING LIST AS A D B LOCK P LUS AND F IREFOX
T RACKING P ROTECTION USES THE SAME BLOCKING LIST AS D ISCONNECT ).
many more protection techniques. We use a smaller number of crawls for each URL but does not provide any justification
crawled websites than Merzdovnik et al. [12] (who uses Alexa for this number. To the best our knowledge, we here propose
Top 100000) because some of our metrics (such as number of the first measurement error analysis for privacy protection
cookies) require a stateful crawl and thus forbid parallelism. technique comparison (see § V-A).
Third, we use more metrics (see Table III) than any existing
work. We are thus able to provide a better description of IV. M ETHODOLOGY
techniques’ performance. We did not compute the host number This section presents data collection and used metrics for
metric used by Hill [9], [10] because it is very similar to the privacy protection and webpage quality.
number of domains. This metric actually reflects the internal
A. Data collection
network architecture of trackers, which is not relevant for
privacy protection comparison. Finally, none of these work We use OpenWPM [48], an open-source framework written
performs several measurements on each website to remove in Python that relies on Selenium for browser automation.
measurement error, except Mayer et al. [7] who perform three OpenWPM supports Firefox and provides some extensions.
With minor modifications, we add extensions presented in
4
Type Authors & references Privacy protection Webpage quality Browser
Manual analysis
Image data size
Script data size
TP/FP/FN/TN
Loading time
# domains
# requests
# cookies
# image
# script
# hosts
Data
Krishnamurthy et al. [1]
Web Fruchter et al. [44]
privacy Carrascosa et al. [45]
analysis Englehardt et al. [47]
Englehardt et al. [48]
Malandrino et al. [49]
Privacy Malandrino et al. [50]
protection Kontaxis et al. [39]
Achara et al. [22]
Tracker Yu et al. [54]
detection Ikram et al. [55]
Krishnamurthy et al. [6]
Mayer et al. [7]
Comparison Balebako et al. [8]
Hill [9]
Hill [10]
Wills et al. [11]
Merzdovnik et al. [12]
Traverso et al. [13]
Our work -
TABLE III
M ETRICS USED FOR PRIVACY PROTECTION TECHNIQUE COMPARISON (
FOR METRIC USED AND FOR METRIC USED WITH BREAKDOWN ).
Table I and § II-B. This study has been conducted with Firefox A protection technique is effective if the number of first
45. party requests is not impacted, and the number of third party
We crawl the highest-ranked websites by Alexa [70]. Unless requests decreases.
stated otherwise, we perform measurements in August 2017, The third metric is the number of third party domains
using IP addresses located in Japan. Typically, a crawl on accessed during a crawl. This is complementary to the number
the Alexa Top 1000 (§ VI-A) takes about three days with of third party requests. It provides a metric on the number of
commodity hardware. blocked entities. There may be a few domains that generate
B. Privacy protection a lot of requests, or a many domains that produce a few
requests. Efficient protection techniques reduce the number of
In this section, we present our robust methodology to third party domains.
compare privacy protection techniques across several websites
This is however not sufficient since third party requests
and using several metrics. We also present privacy footprint
are not always used to track users, they can also provide
[1].
content to users (e.g. media resources) or contact non-tracking
1) Browsing metrics: As explained in § II-A, privacy leak-
third parties (e.g. website analytics). The fourth metric is the
age occurs through communications with trackers, and local
number of the profile cookies obtained from a stateful crawl.
data storage (e.g. cookie) allows trackers to easily identify
users across website. We consider five simple browsing-related Protection techniques with good performances diminish the
metrics that reflect the ability of protection techniques to number of cookies.
hinder these two phenomena. We first focus on HTTP requests. The last metric is the total amount of data transferred.
We separately count the requests that are made to the accessed It is especially important in low-bandwidth situations for
domain (the number of first party requests), and the requests example on mobile. This metric is directly related to tracking
that are made to other domains (the number of third party and advertisement. Advertisement often generates considerable
requests). To this end, we use the second level domain name traffic during browsing. However, as we will see in § V-A, this
obtained from the Public Suffix List [71]. These metrics give a metric shows high variability when crawling the same website.
rough estimation on the performance of a particular extension. We thus did not use this metric in our analysis.
5
1.0 the same hosting service or Content Delivery Network (CDN)
are grouped together by ADNS because they share authoritative
0.8 DNS servers. They thus develop a new method called root.
With root, if a third party is located in a hosting service
0.6 or a CDN, the second-level domain-name of the third party
ECDF
6
80 1.0
# 1st party requests
70 # 3rd party requests
Relative standard error of the mean (%)
ECDF
40
30 0.4
being the worst and 10 the best. Our manual analysis was the Alexa Top 1000 crawled 10 times. Most websites have
performed by six users. a relative standard error smaller than that of www.yahoo.jp
V. M EASUREMENT PARAMETERS (here in dashed lines). Overall, 99% of websites have a relative
standard error smaller than 1% for all observed features.
We address two measurement parameters: the impact of
crawling parameters on measurement error, and the training B. Privacy Badger training
of Privacy Badger.
Privacy Badger employs heuristics to determine if a domain
A. Impact of crawling parameters on measurement error is performing tracking (see § II-B1). Privacy Badger measures
During our preliminary experiment, we notice that measure- the number of times that a domain reads a cookie as a
ment results are not stable. Querying twice a website usually third party. When this number is greater than three, the
yields two different metric values. To determine the appropri- considered domain is blocked. On top of this, cookies are also
ate number of measurements that generates a small error, we whitelisted using heuristics. This behavior causes a freshly
conducted a detailed study on the website showing the high- installed Privacy Badger not to block any domain. As the
est variability in our preliminary experiment: www.yahoo.jp. user browses websites, more and more domains are blocked.
These measurements were performed is November 2016. Vary- To fairly evaluate Privacy Badger, we analyze the impact of
ing the number of measurements, we computed the relative Privacy Badger training. In other words, we intend to quantify
standard error of the mean of four browsing metrics. how many websites need to be accessed to ensure that Privacy
Figure 2 shows that the relative standard error decreases Badger is completely trained.
when the number of measurements increases. Performing ten If we were to determine the number of websites regular
crawls reduces the relative standard error on the number of first users need to browse to train Privacy Badger, we should use
party requests, third party requests, and third party domains to a browsing model such as AOL search logs [73] as Roesner
less than 5%. The number of bytes received, however, still has et al. [59] do. Our use-case here is however to ensure that
a large error, we thus discard this metric. One possible reasons Privacy Badger is trained for a specific set of websites. We
for this measurement error is the webpage advertisement thus measure Privacy Badger’s performance on the Alexa Top
churn. Guha et al. [72] analyze ad churn during page reloads. 100 websites after training. These crawls were performed is
They show that the number of unique new ads increases November 2016. This training consists of several repeated
quickly during the first ten reloads but only linearly thereafter. crawls targeting the same set of websites and performed
This further justifies our choice to use ten measurements. An without reinitializing the user profile, i.e. Privacy Badger
existing comparison of privacy protection techniques [7] uses continues its training. We use zero to five successive training
three crawls for each website and extension. They however crawls and then perform a single measurement crawl.
do not motivate this choice nor provided measurement error The results of this experiment are shown in Figure 4. The
analysis. dashed line corresponds to a reference: crawling results of
Figure 3 is the ECDF of the relative standard error of the a bare browser. KS test groups the six measurements with
mean of the number of first and third party requests, and Privacy Badger together, but separates them from the bare
of the number of third party domains for all websites in browser. Without training, Privacy Badger is already able to
7
# 3rd party req. # 1st party req.
3400
(EasyList, EasyPrivacy, Peter Lowe’s Ad Server list and Mal-
3200 ware domains). One configuration uses NoTrace with the high
3000
2800 preset (the default configuration has no effect on our metrics).
2600
The last 14 configurations are extensions with their default
8000 setting. We do not use the Firefox Tracking Protection [39]
7500
7000
6500 because OpenWPM does not support it yet. However, this
6000
5500 technique uses the Disconnect blocking list. This means that
5000
by analyzing Disconnect, we provide a performance bound on
# 3rd party dom.
900
850
800 to the fact that Javascript is pervasive on Internet. Other
750
700
650 techniques exhibit a limited impact on th enumber of blocked
0 1 2 3 4 5
# of training crawls first party requests.
Fig. 4. Performances of Privacy Badger depending on the number of crawl Except for the number of cookies, the two most effective
performed for its training. The horizontal dashed line represents a default
Firefox as reference. extensions are RequestPolicy Continued and NoScript which
reduce the number of third party HTTP requests by 96% and
87%. However, as shown in § VI-B and other studies [6], [50],
provide partial protection. After a single training pass, the they strongly impact webpage quality.
number of third party requests is reduced by 10.3%. The Among other techniques, Ghostery [18] and uBlock Origin
results however do not improve when the number of training [17] provide the best performances. They reduce the number
crawl increases. We also ran a single training crawl on the of third party HTTP requests by 58% and 55%. Ghostery
Alexa Top 500 websites and observed similar performance as however needs to be configured which is difficult for users
with a single training crawl. The training is thus complete [56]. AdBlock Plus [16] uBO-like has similar performances
with less than one crawl (here one hundred websites). My- with uBlock Origin and Ghostery. Our results show that
TrackingChoices uses the same principle but without Privacy users can manually setup AdBlock to obtain good results,
Badger’s cookie data-based white-listing heuristics. One can unfortunately this is difficult [56]. Furthermore, AdBlock Plus
thus expect MyTrackingChoices to train faster. Moreover, used with EasyList is much more CPU and memory-hungry
MyTrackingChoices is provided with a small bootstrapping than uBlock Origin [22].
tracker list which should further reduce training time. For the Another group of techniques provides average perfor-
remainder of this paper, we train Privacy Badger using a single mances: Disconnect (and Firefox Tracking Protection), Ad-
crawl on the considered Alexa Top websites. Out of the four Block Plus (with EasyList and EasyList + EasyPrivacy),
existing work that uses Privacy Badger [55], [9], [12], [13], Privacy Badger, Blur, BeefTaco and NoTrace. Their results
only two actually peform some training [9], [12]. We here are mostly consistent across third party requests and domains,
provide the first estimation of how much website needs to be and cookies. Privacy Badger [23], Blur [21], Disconnect [19]
visited to complete Privacy Badger’s training. and AdBlock Plus with the EasyList block a relatively large
number of third party HTTP requests (between 33% and 48%),
VI. R ESULTS but a fairly small number of third party domains (between 8%
and 33%). We hypothesize that these techniques block the
A. Privacy protection
main trackers but fail to impact the tracking long-tail [48].
We compare protection techniques, first in terms of perfor- MyTrackingChoices performances are worse than untrained
mance and then, regarding their blocking overlap. Privacy Badger. We hypothesize that this is due to a GUI
1) Browsing metrics: We perform a general comparison bug that forbids users from customizing website categories
of privacy protection techniques. We crawled the Alexa Top where blocking occurs. MyTrackingChoices thus always block
1000 World using 21 different browser configurations. One tracking on 13 websites categories out of 32. BeefTaco has a
configuration is the default setup of the Firefox browser (Bare). noticeable impact on cookies but no impact on other metrics.
One configuration is Firefox setup with the DoNotTrack This is consistent with previously observed tracking long tail
HTTP header. Two configurations use Firefox’s ability to [48]. This phenomenom makes the opt-out cookie maintenance
block cookies: third party ones or third party cookies except very difficult while the protection that they offer is limited
from previously visited websites. We do not use complete to cookie blocking. NoTrace [20] provides average protection
cookie blocking because we hypothesize that users want to despite being setup at its highest level. NoTrace increases
keep using cookie for some domains. Three configurations users’ online privacy awareness [57], but unfortunately, users
set up AdBlock Plus with various lists: EasyList, EasyList cannot reliably setup the extensions [56] which is required.
and EasyPrivacy and a filter set similar to uBlock Origin Remaining techniques provide poor protection. Decen-
8
35000
30000 80000
25000 60000
20000
15000 40000
10000
20000
5000
0 0
d) - 7
3PC E 7
Blur - 5
WOT - 11
PB - 7
FF e - 7
FF no DNT - 7
Dece PC - 7
Be ntr. - 7
WOT - 7
-7
HTTP MTC - 6
Disco Eve. - 6
NoTr ct - 5
ABP 4
Ghos EL - 4
ABP tery - 4
uBloc EL+EP - 3
ABP ukOrigin - 3
RPC - 3
ript - 2
1
Dece - 10
FF Batr. - 9
FF e - 9
FF no DNT - 9
9
MTC 8
HTTP PC EV - 9
-8
BeefT . - 9
PB (t PB - 7
A )-6
DiscoBP EL - 6
t-5
ABP E Blur - 5
uBloc L+EP - 4
ABP ukOrigin - 3
Ghos like - 3
NoSc ry - 3
RPC - 2
1
FF no efTaco -
ace -
e-
FF no 3PC -
aco -
ript -
V
S Eve
d
nnec
r
Bo-lik
r
ace
nne
te
raine
FF Ba
raine
n
3
NoSc
Bo-
NoTr
S
3
PB (t
50000
2000
40000
1500
Nb domain
Nb cookie
30000
1000
20000
500
10000
0 0
MTC 14
FF no no 3PC - 1
Dece OT - 15
FF Ba r. - 14
FF e - 14
HTTP DNT - 14
- 13
BeefT ve. - 14
NoTr PB - 12
10
PB (t C EV - 10
Disco ined) - 9
Blur - 8
ABP 7
Ghos EL - 6
ABP E tery - 5
uBloc L+EP - 4
ABP u Origin - 3
NoSc ike - 3
RPC - 2
1
FF ace - 1
MTC 13
FF Ba T - 15
De re - 14
HTTP centr. - 14
- 12
A - 10
FF DNe. - 14
BeefT T - 14
NoTr PB - 11
PB (t BP EL - 9
3PC E 8
V-7
Disco Blur - 6
FF no nect - 5
ABP 3PC - 4
uBloc EL+EP - 3
Ghos igin - 2
ABP u tery - 2
NoSc ike - 2
RPC - 1
1
t-
ript -
aco -
FF no rained) -
ript -
nnec
aco -
r
ace
nt
Bo-l
WO
W
Bo-l
SE
S Ev
kOr
ra
n
3P
traleyes [27] has almost no effect on the results. We hypoth- cookies also slightly reduces the number of third party re-
esize that blocking library loading requests only marginally quests. As hypothesized in [48], this may be due to impeded
impacts HTTP traffic. The DoNotTrack HTTP header [5] also cookie-syncing interactions. However, contrary to Englehardt
has barely no effect. If trackers complied with DoNotTrack, et al. [48], third party cookie blocking here has poor perfor-
the number of cookies would significantly drop. We thus mances. This may be due to the fact that, unlike [48], we
conclude that DoNotTrack is not respected by most trackers. perform a stateful crawl. In a previous study [47] that uses
WebOfTrust and HTTPSEverywhere have no effect. stateful crawl and OpenWPM, third party cookie blocking also
We note that WebOfTrust and NoTrace both yield signifi- exhibits poor performances. The metric used in this study is
cantly more third party requests than Firefox without any ex- however extremely specific, it thus is very difficult to compare
tensions. WebOfTrust also contacts more third party domains our results with theirs. Furthermore, some third parties from
and increases the number of cookies. WebOfTrust annotate the Alexa Top 1000 are first party at some point (e.g. Twitter,
hyperlinks in webpages with trust rating. We hypothesize that Facebook). Their cookies are thus created as first parties.
this rating is obtained from WebOfTrust servers and thus 2) Privacy footprint: We then compared extensions using
impact our metrics. We do not have any explanation regarding the privacy footprint (see § IV-B3 and [1]). Figure 6 presents
NoTrace’s behavior. these results. They are similar to browsing metrics’ Figure 5.
When we block third party cookies in Firefox, there is a Six extensions exhibit much better results than the rest:
reduction in the number of third party domains. Blocking RequestPolicy Continued, NoScript, Ghostery, uBlock Origin
9
# 3rd party Mean # 3rd party by 1st party only blocked by Ei : |Bi − {Bk ∀k 6= i}|.
# 1st party link. to top 10 3rd parties As an example, we see AdBlock Plus on the second line
1000 of Fig. 7a. Square surfaces on this line represent the number
of third party requests blocked by AdBlock Plus: 1356. The
800 first square on the considered line represents the intersection
with Privacy Badger which is encoded by the surface of the
600
inner purple square, here 678. The green surface outside the
400 purple square describes the resources blocked by AdBlock
Plus but not by Privacy Badger. The next square on this line is
200 located on the diagonal, and represents the resources blocked
by AdBlock Plus only. Remaining squares on this line depicts
0 intersections with Disconnect, Ghostery and uBlock Origin in
WebOfTrust
NoTrace
MyTracking
FF Bare
BeefTaco
FF no 3rd
Everywhere
cookie EV
FF DNT
Privacy
Badger
ABP EL
Disconnect
Blur
Ghostery
NoScript
Decentraleyes
ABP EL+EP
ABP uBo-like
uBlockOrigin
Request Policy
Privacy Badger
no 3rd
FFcookie
Choices
the same fashion as the intersection with Privacy badger is
Continued
HTTPS
(trained)
10
(a) # third party requests (b) # third party domains
Fig. 7. Overlap of third party requests and domains blocked by five extensions (PB: Privacy Badger, Dis: Disconnect, ADB: AdBlock Plus, Gho: Ghostery,
uBO: uBlock Origin). Square surfaces represent the metric value for the extension on the row.
the HTML size metric because there is no significant change We use the questions about layout similarity and missing
across protection techniques. NoScript has a noticeable impact elements presented in § IV-C2. Figure 9 exposes our results.
on all metrics. This is especially visible for the script number, The blue (resp. red) boxplot corresponds to the first (resp.
and image and script size. RequestPolicy Continued’s impact is second) question. Overall, Privacy Badger, Blur, Disconnect,
smaller than NoScript’s except for image size. We hypothesize uBlockOrigin and Ghostery have a very small impact on ren-
that RequestPolicy Continued blocks third party image hosting dered webpages. On the other hand, RequestPolicy Continued
services, and thus impacts many websites. NoScript greatly and NoScript significantly impact on pages both in terms of
reduces the total size of scripts which is consistent with its layout and missing elements. This is consistent with § VI-B1
goal: blocking JavaScript on webpages. Other privacy protec- and previous results [6], [50]. The manual analysis uncovers
tion techniques appear to have a limited impact of website RequestPolicy Continued’s strong impact on webpage quality
quality. Compared to other techniques, Ghostery exhibits a which was not exposed by HTML-based metrics.
smaller impact on images and a higher reduction of the number Summary: Both studies show that RequestPolicyContinued
of script and their total size. This is consistent with its tracking and NoScript reduce webpage quality. Other extensions do not
protection goal. This extension actually does not intend to have such impact.
block media such as advertisements.
2) Manual analysis: The previous subsubsection analyzes C. Summary
webpages quality in terms of HTML and associated data. This Figure 10 summarizes our findings. Each metric-based
automated approach does not necessarily reflect the quality synthetic index is build as the mean of previously used
of the rendered webpage. We perform a manual analysis to metrics (see § IV-B1 § IV-C1) normalized using the metric
cope with this limitation. We gather screenshots of the Alexa value obtained with Firefox used alone. Manual analysis-
Top 50 without pages from the same entity but with distinct based index is the mean of the mean rating of both ques-
localization (e.g. Google and Amazon). These screenshots tions. Figure 10a shows that our manual analysis captures
were gathered in November 2016. We use the seven techniques the negative effects of RequestPolicy Continued that are not
that provided the best privacy protection (see § VI-A1): Privacy visible with HTML-based metrics. Figures 10b and 10c
Badger trained, Blur, Disconnect, uBlock Origin, Ghostery, demonstrate RequestPolicy Continued and NoScript show the
NoScript and RequestPolicy Continued. For each webpage and best protection performances but impact browsing experience.
protection technique, we display a picture that contains two Ghostery and uBlock Origin protect users slightly less but have
screenshots side-to-side to the user: the left one captured with a smaller impact on webpage quality. Remaining techniques
Firefox alone, the other with Firefox used with an extension. provide average protection.
11
1e7
5000 7
6
3000 4
2000 3
2
1000
1
0 0
RPC - 2
ABP E ined) - 2
ABP E aco - 4
NoSc ce - 3
-2
ABP u PB - 2
e-2
3PC E 2
V-2
FF no MTC - 2
BeefT PC - 2
Disco aco - 2
t-2
uBloc DNT - 2
FF Ba in - 2
B 2
Dece lur - 2
HTTP ntr. - 2
Ghos ve. - 2
WOT 2
NoTr - 2
1
1
PB (t PB - 4
FF no ed) - 4
Dece C - 4
Disco ntr. - 4
M 4
BeefT TC - 4
-4
ABP E r - 4
L-4
FF no Bo-like - 4
FF DN - 4
Ghos T - 4
FF Ba y - 4
re - 4
HTTP WOT - 4
uBloc S Eve. - 4
NoTr in - 4
2
1
ace -
FF no ABP EL -
re -
tery -
ript -
t-
ript -
RPC -
L+EP
L+EP
V
nnec
nnec
Blu
ter
Bo-lik
3PC E
3P
kOrig
kOrig
a
3
SE
rain
NoSc
F
ra
ABP u
PB (t
1e7
3.0
2500
2.5
1500
1.5
1000 1.0
500 0.5
0 0.0
uBloc ace - 3
HTTP Trace - 4
-4
4
PB (t PC EV - 3
FF no d) - 3
Dece PC - 3
ABP Er. - 3
L-3
Disco PB - 3
FF Ba ct - 3
Bee re - 3
HTTP fTaco - 3
-3
MTC 3
ABP u F DNT - 3
ABP Eo-like - 3
NoTr EP - 3
RPC - 2
te 2
NoSc ry - 2
1
FF no S Eve. - 3
FF no EV - 3
BeefT PC - 3
ABP E - 3
raine L - 3
Dece d) - 3
FF Ba r. - 3
FF DNe - 3
Blur - 3
3
WOT 3
-3
Disco MTC - 2
uBloc nnect - 2
ABP E rigin - 2
ABP u L+EP - 2
Ghos ike - 2
R 2
NoSc PC - 2
1
FF no Blur -
.-
in -
ript -
T-
PB -
tery -
ript -
aco
WOT
S Eve
r
nt
nt
kOrig
nne
raine
L+
3
Bo-l
3PC
Ghos
kO
B
No
3
PB (t
VII. D ISCUSSIONS that users are more preoccupied by blocking ads than tracking.
Malloy et al. [75] show that ad-blockers usage vary between
A. Recommendations 18% and 37% in analyzed countries (US, UK, Germany,
Our analysis shows that RequestPolicy Continued and No- France, and Canada). Metwalley et al. show [46] that 10 to
script are the most effective privacy protection techniques. 18% of users had installed AdBlock Plus while less than 3.5%
Similarly to previous work [6], [50], we confirm that they (resp. 2%) were using DoNotTrackMe/Blur (resp. Ghostery).
affect webpage quality. Users that want to avoid website Similarly, Pujol et al. [74] speculate that out of the 19.7%
breakage can instead use uBlock Origin or Ghostery. The latter of users that install AdBlock Plus, less than 15% of them
however needs to be configured which is difficult for users setup the EasyPrivacy list. We show that the default setup
[56]. We note that previous work [12], [13] also recommended of AdBlock Plus has average performances but that it can
uBlock Origin and Ghostery. Their results were however noisy be configured to achieve good results. This task is however
(percentiles are overlapping in [13] and some techniques difficult for users [56]. We thus advise ad-blocking users that
actually observed more fingerprintting than the default browser want to improve their privacy to directly change to more
in [12]) due to the use of single crawl to each website. Our efficient extensions (see above).
robust measurement methodology does not suffer from the Extensions that do not rely on manually maintained filters,
same weakness, and show that uBlock Origin or Ghostery Privacy Badger and MyTrackingChoices, exhibit average per-
exhibit statistically better performance than other techniques. formances. It however is not clear how their performances
Previous network traffic monitoring studies [46], [74] show may be improved. Analyzing extensions’ behavior against
12
Layout similarity rating tory) are less valued by advertisers. A user that blocks all third
Proportion of element missing with extension parties cookies thus reduces his customer value for advertisers,
and, consequently, publisher’s revenue. This however does not
Good
10
completely block tracking and thus, advertisement transactions
8 (e.g. bidding in RTB). Indiscriminate third party blocking
techniques (such as RequestPolicy Continued [25]) however
6 have a much bigger impact on advertising. They for example
Score
ct
in
ry
ed licy
rip
Blu
rig
ne
te
ed dg
inu o
Sc
os
on
kO
ain a
nt st P
tion for others. This is obviously less aggressive than most
No
Tr cy B
Gh
sc
loc
Coque
Di
iva
Re
Pr
report that 30% of users block the tracking for all categories.
Fig. 9. Webpage quality manual analysis. Users are presented two screenshots These users thus have no reason to use these two techniques
obtained with Firefox alone and with a privacy protection technique. The first
question (blue) asks user to rate layout similarity. The second question (red) instead of other ones.
corresponds to the proportion of elements observed with Firefox alone also
present when a privacy protection technique is used. Screenshots are captured C. Future work
on the Alexa Top 50. Boxplots extend to the upper and lower quartile and
contain a line that represents the median. Dashed lines show min/max range. As shown in § III, existing work uses a vast array of metrics.
When not visible, median is located on top of the upper quartile. Some of them are tailored to very specific use cases (such
as behavioral advertising [8], [45] and cookie-based mass-
surveillance [47]). Similarly, data-related metrics present an
existing blocking lists would help to improve their results. obvious interest in a mobile context. We intend to add new
Blocking list-based protection techniques are efficient but their metrics to this work. We also want to analyze the impact of
scalability is limited due to human intervention. Current efforts all these protection techniques on specific attacks (browser
[68], [52], [53], [51], [55] to automatically build blocking list fingerprinting [4], [48], cookie syncing [67], [2], [76], [48]).
are thus extremely relevant. Analyzing extensions inside other browsers, such as
Ghostery and uBlock Origin block a significant amount of Chrome, Internet Explorer or Opera, would provide a wider
resources that are not blocked by other protection techniques. picture of privacy protection techniques in the wild. Sele-
Both should thus be considered as relevant when assessing nium, the browser automation framework used by OpenWPM,
users’ ability to protect themselves. supports these browsers. OpenWPM, however, currently only
supports Firefox. Furthermore, browsers include more and
B. Limitations more privacy protection (e.g. Firefox [39], Brave [40], Cliqz
In this work, we estimate the impact of privacy protection [54]). Cliqz and Brave are not supported by Selenium, which
techniques on webpage quality in terms of missing elements make both browsers impossible to use with OpenWPM (see
and data (using HTML-based metrics § IV-C1 and alteration to § IV-A). We however intend to add them later on.
the webpage layout and elements (using a user manual analysis While some works use domain name-based heuristics [77],
§ IV-C2). We however do not address the functionality loss. [67] to identify third parties, others use blocking lists-based
We intend to explore this aspect in future work along two extensions as ground-truth (see Table II). When the latter
axes. We plan to explore each website beyond its homepage method is used, tracking’s long tail [48] may cause many
to improve website coverage. false negatives. Beyond our analysis in § VI-A, we intend
Several work [59], [68], [51], [52], [53], [55] automatically to investigate blocking lists more thoroughly.
build blocking lists. We did not evaluate them because pro-
duced lists are not publicly available. VIII. C ONCLUSIONS
Evaluated approaches aim at blocking tracking. One side ef- This work extensively compares existing privacy protection
fect is advertisement blocking. These approaches thus threaten techniques against third party tracking and provides four
the current economic model of many publishers on the Inter- contributions. First, we propose a robust methodology to
net. Techniques examined in this work however do not have the compare privacy protection techniques when crawling many
same impact on third party tracking. For example, Ghostery websites, and quantify measurement error. We use the privacy
and MyTrackingChoices can block tracking for specific web- footprint [1] and apply the Kolmogorov-Smirnov (KS) test on
site categories. Similarly, Firefox’s third party cookie blocking browsing metrics to evaluate privacy protection. This test is
only partially impacts advertisement techniques (such as Real- likewise applied to HTML-based metrics to assess the impact
Time Bidding, RTB) that rely on user interests. Olejnik et al. on webpage quality. We also design a manual analysis to
[67] show that new users (i.e. users without any browsing his- complement HTML-based metrics. Our second contribution is
13
1.05 Blur Disconnect
High
High
Disconnect
High
9.5 uBlockOrigin
9.5 Privacy Badger
Ghostery 1.00 uBlockOrigin (trained)
Disconnect Ghostery
uBlockOrigin 0.95 Privacy Badger Blur
Manual analysis quality index
Blur 9.0
Low
Low
Low
0.65
0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Low HTML quality index High High Privacy protection index Low High Privacy protection index Low
(a) HTML-based vs manual analysis quality (b) Privacy protection vs HTML-based quality (c) Privacy protection vs manual analysis
Fig. 10. Scatter plots of synthetic index for privacy protection, HTML-based quality and manual analysis-based quality.
a privacy protection techniques comparison in terms of overlap [9] R. Hill, “Comparative benchmarks against widely used blockers:
and performances. We show that the most popular privacy Top 15 most popular news websites,” 2014, accessed: 2017-07-30.
[Online]. Available: https://github.com/gorhill/httpswitchboard/
protection techniques exhibit a common blocking baseline. We wiki/Comparative-benchmarks-against-widely-used-blockers:
also highlight the fact that some extensions, namely Ghostery -Top-15-Most-Popular-News-Websites
and uBlock Origin, have a very specific behavior and are the [10] ——, “ublock and others: Blocking ads, track-
ers, malwares,” 2015, accessed: 2017-07-30. [Online].
only ones to block some domains. Our third contribution is Available: https://github.com/gorhill/uBlock/wiki/uBlock-and-others%
a comparison of the impact of privacy protection techniques 3A-Blocking-ads%2C-trackers%2C-malwares
on webpage quality. Our fourth contribution is a set of usage [11] C. E. Wills and D. C. Uzunoglu, “What ad blockers are (and are not)
doing,” in HotWeb 2016, pp. 72–77.
recommendations for end-users, and research recommendation [12] G. Merzdovnik, M. Huber, D. Buhov, N. Nikiforakis, S. Neuner,
for the scientific community. Ghostery and uBlock Origin M. Schmiedecker, and E. Weippl, “Block me if you can: A large-scale
provide the best trade-off between protection and webpage study of tracker-blocking tools,” in EuroSP 2017.
[13] S. Traverso, M. Trevisan, L. Giannantoni, M. Mellia, and H. Metwalley,
quality. Ghostery however requires a configuration step which “Benchmark and comparison of tracker-blockers:should you trust them?”
is difficult for users [56]. The RequestPolicy Continued and in TMA 2017.
NoScript extensions exhibit the best performances but reduce [14] T. Bujlow, V. Carela-Español, J. Solé-Pareta, and P. Barlet-Ros, “A
survey on web tracking: Mechanisms, implications, and defenses,”
webpage quality. Ghostery and uBlock Origin use manually Proceedings of the IEEE, vol. PP, no. 99, pp. 1–35, 2017.
built blocking lists which are cumbersome to maintain. Re- [15] J. Estrada-Jiménez, J. Parra-Arnau, A. Rodrı́guez-Hoyos, and J. Forné,
search efforts should focus on improving existing approaches “Online advertising: Analysis of privacy threats and protection ap-
proaches,” Computer Communications, 2016.
that do not rely on blocking lists (such as Privacy Badger or [16] “AdBlock Plus,” accessed: 2017-07-30. [Online]. Available: https:
MyTrackingChoices) and automating blocking list building. //adblockplus.org/
[17] “uBlock Origin,” accessed: 2017-07-30. [Online]. Available: https:
//github.com/gorhill/uBlock
R EFERENCES [18] “Ghostery,” accessed: 2017-07-30. [Online]. Available: https://www.
ghostery.com/
[19] “Disconnect,” accessed: 2017-07-30. [Online]. Available: https://
[1] B. Krishnamurthy and C. E. Wills, “Generating a privacy footprint on
disconnect.me/
the internet,” in IMC 2006, pp. 65–70.
[20] “NoTrace,” accessed: 2017-07-30. [Online]. Available: http://www.
[2] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and isislab.it/projects/NoTrace/
C. Diaz, “The web never forgets: Persistent tracking mechanisms in [21] “Blur,” accessed: 2017-07-30. [Online]. Available: https://dnt.abine.com
the wild,” in CCS 2014, pp. 674–689. [22] J. P. Achara, J. Parra-Arnau, and C. Castelluccia, “Mytrackingchoices:
[3] A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C. J. Hoofnagle, “Flash Pacifying the ad-block war by enforcing user privacy preferences,” in
cookies and privacy,” in SSRN, 2009, pp. 158–163. WEIS 2016.
[4] P. Eckersley, “How unique is your web browser?” in PETS 2010, pp. [23] “Privacy Badger,” accessed: 2017-07-30. [Online]. Available: https:
1–18. //www.eff.org/fr/privacybadger
[5] “DoNotTrack,” accessed: 2017-07-30. [Online]. Available: https: [24] “NoScript,” accessed: 2017-07-30. [Online]. Available: https://noscript.
//tools.ietf.org/html/draft-mayer-do-not-track-00 net/
[6] B. Krishnamurthy, D. Malandrino, and C. E. Wills, “Measuring privacy [25] “RequestPolicyContinued,” accessed: 2017-07-30. [Online]. Available:
loss and the impact of privacy protection in web browsing,” in SOUPS https://requestpolicycontinued.github.io/
2007, pp. 52–63. [26] “HTTPS Everywhere,” accessed: 2017-07-30. [Online]. Available:
[7] J. R. Mayer and J. C. Mitchell, “Third-party web tracking: Policy and https://www.eff.org/fr/https-everywhere
technology,” in SP 2012, pp. 413–427. [27] “Decentraleyes,” accessed: 2017-07-30. [Online]. Available: https:
[8] R. Balebako, P. Leon, R. Shay, B. Ur, Y. Wang, and L. Cranor, //decentraleyes.org
“Measuring the effectiveness of privacy tools for limiting behavioral [28] “WebOfTrust,” accessed: 2017-07-30. [Online]. Available: https:
advertising,” in W2SP 2012. //www.mywot.com/
14
[29] N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel, F. Piessens, and [58] B. Krishnamurthy and C. Wills, “Privacy diffusion on the web: a
G. Vigna, “Cookieless monster: Exploring the ecosystem of web-based longitudinal perspective,” in WWW 2009, pp. 541–550.
device fingerprinting,” in SP 2013, pp. 541–555. [59] F. Roesner, T. Kohno, and D. Wetherall, “Detecting and defending
[30] J. Vincent, “Ghostery has been bought by the developer against third-party tracking on the web,” in NSDI 2012, pp. 12–26.
of a privacy-focused browser,” 2 2017, accessed: 2017-07- [60] D. Fifield and S. Egelman, “Fingerprinting web users through font
30. [Online]. Available: http://www.theverge.com/2017/2/15/14622484/ metrics,” in FC 2015, pp. 107–124.
ghostery-ad-tracking-plug-in-cliqz [61] Ł. Olejnik, G. Acar, C. Castelluccia, and C. Diaz, “The leaking battery,”
[31] “Network advertising industry,” accessed: 2017-07-30. [Online]. in DPM 2015, pp. 254–263.
Available: http://www.networkadvertising.org/choices/ [62] K. Mowery and H. Shacham, “Pixel perfect: Fingerprinting canvas in
[32] “Digital advertising alliance,” accessed: 2017-07-30. [Online]. Available: html5,” in W2SP 2012.
http://www.aboutads.info/choices/ [63] Y. Cao, S. Li, and E. Wijmans, “(Cross-)Browser Fingerprinting via OS
[33] “Beeftaco,” accessed: 2017-07-30. [Online]. Available: https://jmhobbs. and Hardware Level Features,” in NDSS 2017.
github.io/beef-taco/ [64] G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens, and
[34] “Allowing acceptable ads in adblock plus - agreements,” B. Preneel, “Fpdetective: dusting the web for fingerprinters,” in CCS
accessed: 2017-07-30. [Online]. Available: https://adblockplus.org/ 2013, pp. 1129–1140.
acceptable-ads-agreements [65] A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner, “Internet jones and
[35] “EasyList/EasyPrivacy,” accessed: 2017-07-30. [Online]. Available: the raiders of the lost trackers: An archaeological study of web tracking
https://easylist.to/ from 1996 to 2016,” in Usenix Security 2016, pp. 997–1013.
[36] V. S. Eckert, J. Klofta, and J. L. Strozyk, ““Web of Trust” späht [66] A. Ghosh and A. Roth, “Selling privacy at auction,” in EC 2011, pp.
nutzer aus,” 11 2016, accessed: 2017-07-30. [Online]. Available: 199–208.
https://www.tagesschau.de/inland/tracker-online-103.html [67] L. Olejnik, T. Minh-Dung, and C. Castelluccia, “Selling off privacy at
[37] L. Olejnik, “Proposed amendments to eprivacy reg- auction,” in NDSS 2014, pp. 104–114.
ulation are great,” 6 2017, accessed: 2017- [68] J. Bau, J. Mayer, H. Paskov, and J. C. Mitchell, “A promising direction
07-30. [Online]. Available: https://blog.lukaszolejnik.com/ for web tracking countermeasures,” in W2SP 2013.
proposed-amendments-to-eprivacy-regulation-are-great [69] V. Kalavri, J. Blackburn, M. Varvello, and K. Papagiannaki, “Like a
[38] “Firefox,” accessed: 2017-07-30. [Online]. Available: https://www. pack of wolves: Community structure of web trackers,” pp. 42–54.
mozilla.org/en-US/firefox [70] “Alexa top 500 sites on the web,” accessed: 2017-07-30. [Online].
Available: http://www.alexa.com
[39] G. Kontaxis and M. Chew, “Tracking protection in firefox for privacy
[71] “Public suffix list,” accessed: 2017-07-30. [Online]. Available: https:
and performance,” in W2SP 2015.
//publicsuffix.org/
[40] “Brave,” accessed: 2017-07-30. [Online]. Available: https://brave.com/
[72] S. Guha, B. Cheng, and P. Francis, “Challenges in measuring online
[41] “Cliqz,” accessed: 2017-07-30. [Online]. Available: https://cliqz.com/en advertising systems,” in IMC 2010, pp. 81–87.
[42] “AdBlock,” accessed: 2017-07-30. [Online]. Available: https: [73] G. Pass, A. Chowdhury, and C. Torgeson, “A picture of search.”
//getadblock.com/ InfoScale, vol. 152, p. 1, 2006.
[43] C. Castelluccia, S. Grumbach, and L. Olejnik, “Data harvesting 2.0: [74] E. Pujol, O. Hohlfeld, and A. Feldmann, “Annoyed users: Ads and ad-
from the visible to the invisible web,” in WEIS 2013. block usage in the wild,” in IMC 2016, pp. 93–106.
[44] N. Fruchter, H. Miao, S. Stevenson, and R. Balebako, “Variations in [75] M. Malloy, M. McNamara, A. Cahn, and P. Barford, “Ad blockers:
tracking in relation to geographic location,” in W2SP 2015. Global prevalence and impact,” in IMC 2016, pp. 119–125.
[45] J. M. Carrascosa, J. Mikians, R. Cuevas, V. Erramilli, and N. Laoutaris, [76] M. Falahrastegar, H. Haddadi, S. Uhlig, and R. Mortier, “Tracking
“I always feel like somebody’s watching me. measuring online be- personal identifiers across the web,” in PAM 2016, pp. 30–41.
havioural advertising,” in CoNext 2015. [77] ——, “The rise of panopticons: Examining region-specific third-party
[46] H. Metwalley, S. Traverso, M. Mellia, S. Miskovic, and M. Baldi, “The web tracking.” in TMA 2014, pp. 104–114.
online tracking horde: a view from passive measurements,” in TMA
2015, pp. 111–125.
[47] S. Englehardt, D. Reisman, C. Eubank, P. Zimmerman, J. Mayer,
A. Narayanan, and E. W. Felten, “Cookies that give you away: The
surveillance implications of web tracking,” in WWW 2015, pp. 289–299.
[48] S. Englehardt and A. Narayanan, “Online tracking: A 1-million-site
measurement and analysis,” in CCS 2016.
[49] D. Malandrino, A. Petta, V. Scarano, L. Serra, R. Spinelli, and B. Krish-
namurthy, “Privacy awareness about information leakage: Who knows
what about me?” in WPES 2013, pp. 279–284.
[50] D. Malandrino and V. Scarano, “Privacy leakage on the web: Diffusion
and countermeasures,” Computer Networks, vol. 57, no. 14, pp. 2833–
2855, 2013.
[51] D. Gugelmann, M. Happe, B. Ager, and V. Lenders, “An automated
approach for complementing ad blockers blacklists,” in PETS 2015, pp.
282–298.
[52] H. Metwalley, S. Traverso, and M. Mellia, “Unsupervised detection of
web trackers,” in GLOBECOM 2015, pp. 1–6.
[53] Q. Wu, Q. Liu, Y. Zhang, and G. Wen, “Trackerdetector: A system
to detect third-party trackers through machine learning,” Computer
Networks, vol. 91, pp. 164–173, 2015.
[54] Z. Yu, S. Macbeth, K. Modi, and J. M. Pujol, “Tracking the trackers,”
in WWW 2016, pp. 121–132.
[55] M. Ikram, H. J. Asghar, M. A. Kaafar, A. Mahanti, and B. Krishna-
murthy, “Towards seamless tracking-free web: Improved detection of
trackers via one-class learning,” ser. PETS 2017.
[56] P. Leon, B. Ur, R. Shay, Y. Wang, R. Balebako, and L. Cranor, “Why
johnny can’t opt out: A usability evaluation of tools to limit online
behavioral advertising,” in CHI 2012, pp. 589–598.
[57] D. Malandrino, V. Scarano, and R. Spinelli, “How increased awareness
can impact attitudes and behaviors toward online privacy protection,” in
SocialCom 2013, pp. 57–62.
15