Professional Documents
Culture Documents
QCSD
QCSD
QCSD
Defence Framework
Jean-Pierre Smith and Luca Dolfi, ETH Zurich;
Prateek Mittal, Princeton University; Adrian Perrig, ETH Zurich
https://www.usenix.org/conference/usenixsecurity22/presentation/smith
on obscuring the feature-rich head of the communication [18]. 2.2 Threat Model
As a promising step forward, Tor has since implemented a We consider an adversary who attempts to determine the web-
derivative of WTF-PAD [21], [36] along with a framework site associated with a traffic trace, through analysing only
for experimenting with other padding approaches [37]. Unfor- packet sizes and timings. This setting occurs when the client
tunately, WTF-PAD has already been shown to be vulnerable uses privacy-enhancing technologies, such as anonymity net-
to website-fingerprinting attacks [12], [15]. works (e.g., Tor) or VPN tunnels (e.g., Wireguard, Open-
The second category, chaff-and-delay defences, shape the VPN, or IPSec), encrypted wireless communication (e.g.,
communication towards a target pattern by adding padding, IEEE 802.11i WPA), or when the adversary wants to identify
chaff traffic, splitting, and delaying the packets sent to and a visit to a particular web page of interest on a website (e.g.,
from the client. Among these defences, BuFLO [24], CS- the page related to a medical condition on a medical website).
BuFLO [23], and Tamaraw [22] offer high levels of privacy Additionally, we inherit the simplifying assumptions used
at the cost of high bandwidth and delay overhead. Tamaraw, throughout the attack literature [11], [14]–[16], [39] to re-
for example, transmits fixed-size packets at a constant rates main consistent in our use of website-fingerprinting attacks in
from the client and from the server. Defences such as Super- the evaluation of our framework. That is, we conservatively
sequence [8] and Walkie Talkie [20] group web pages into assume that (1) an observer can identify the start and end
anonymity sets such that a common target pattern can be of a web-page load, and can thus extract its trace from the
found that reduces the shaping overhead. overall traffic; (2) web pages are loaded sequentially without
All of the above defences rely on shaping the traffic both background noise such as media or file transfers; and (3) the
to and from the client, and have thus been thought to be page of interest is the index page of the domain. Wang and
reliant on client-side and server/proxy-side deployment. How- Goldberg [4] and Cui et al. [40] provide evaluations of these
ever, our framework, QCSD, enables the use of such defences assumptions; however, as we focus on defending against an
with only client-side deployment. We investigate QCSD’s empowered adversary capable of realising these assumptions,
ability to emulate chaff-only and chaff-and-delay defences, inheriting them allows us to both remain consistent with prior
with the FRONT and Tamaraw defences respectively as ex- works as well as to highlight any differences between concep-
emplars. FRONT is one of the few chaff-only defences still tualised defences and their implementations in QCSD.
Framed data and control Transmitted data and control Additionally, a trace may be represented as a pair of one-
messages, such as acknowledgements and flow control, are en- dimensional aggregated time series, one for each direction,
capsulated in frames which are bundled together and placed in Definition 2.2. An aggregated time series, Td∈{0,1} , of the
packets. QCSD utilises the QUIC padding frame (PADDING), direction d of trace P with granularity I, is a sequence of
which is a repeatable, single-byte frame that is discarded upon packet lengths:
receipt, and the ping frame (PING), which forces the server to Td = (x1 , x2 , . . . , xm )
acknowledge the packet containing it.
where xi is the sum of all the packet lengths with timestamps ti
within the half-open time interval [(i − 1) · I, i · I) and with the
Multiplexed streams QUIC connections multiplex multi- specified direction, or 0 if no packets fell within that interval.
ple in-order byte streams over a single connection. The data
for each stream is first encapsulated with a STREAM frame Building blocks of a defence QCSD aims to emulate ex-
before being placed in a packet. A single packet can therefore isting defences in the literature, thus we must establish the
contain data corresponding to multiple streams. A STREAM functionality required to do so. A defence M is a mechanism
frame with the FIN bit set marks the end of the stream. that takes an input trace, P, and alters it to create a new trace
P0 . We can understand the necessary functionality of M by
Independent flow control Each QUIC stream is indepen- investigating the ways in which P and P0 may differ. Consider
dently flow controlled. The maximum-stream-data value is w.l.o.g. x and x0 at the same index in incoming time series rep-
resentations Td=1 and Td=1 0 . There are three cases. First, when
an offset from the start of the stream until which the remote 0
endpoint can send data. It can be increased by sending a x = x the defence M does not perturb the input sequence at
MAX_STREAM_DATA frame with a higher offset to the remote that time interval. Second, when x0 > x the packet to be trans-
endpoint. The amount of data that an endpoint is allowed to mitted must be padded by x∆ = (x0 − x) bytes. Furthermore, if
send on the stream is called its flow-control credits. x = 0 this equates to sending an additional x∆ = x0 bytes when
no packet was present in the input sequence. And third, when
x0 < x, then the input packet is too large, and must reduced in
The standardisation of QUIC brought with it the standard- size, sending x0 bytes in that interval. If x0 = 0 this equates to
isation of the HTTP/3 protocol [46]. HTTP/3 is the next preventing the transmission of any data at that time interval.
generation of the Hypertext Transfer Protocol that underlies From the above formulation, the requirements for an expres-
the web and will be solely transported via QUIC. It defines sive defence framework are clear. A defence framework that
the semantics for using HTTP with QUIC, such as each QUIC emulates a defence M must be able to (1) send chaff packets,
stream transporting a single HTTP request and response. Fur- (2) pad packets, (3) split packets, and (4) delay the sending
thermore, HTTP/3 demarcates HTTP headers and payloads of packets from the client and server. Below we describe our
by encapsulating them in frames with prefixed lengths [46]. formulation of such a framework, QCSD.
3.1 Overview 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 t(s)
QCSD allows shaping transmissions on HTTP/3-QUIC con-
Control send MSD{sk , 3 · L}
nections in both directions, whether to or from the client, interval
according to the configured website-fingerprinting defence. send MSD{s j , 2 · L}
Additionally, QCSD’s design aims to (1) provide website-
fingerprinting resilience similar to that of the stipulated de- send MSD{si , L}
fence, (2) avoid adding additional data and delays to the
connection beyond required by the defence, and (3) allow Figure 1: A simplified example of the events generated by
server interoperability by being standards compliant with the QCSD to add chaff traffic to a connection according to the
QUIC [45] and HTTP/3 [46] specifications. chaff-only defence trace P = Pout ∪ Pin , with packets of length
To enact arbitrary defences, QCSD utilises standardised L, length l ≤ L, and chaff streams si , s j , and sk .
features of QUIC and HTTP/3 to provide the building blocks
of website-fingerprinting defences, identified in Section 2.4.
where each stream is individually flow-controlled. Delaying
packets is accomplished through individual, fine-grained ma-
3.1.1 Sending Chaff and Padding Packets: PADDING, nipulation of this stream-based flow control. A server will
PING, and Chaff Streams only send data for a stream if it has flow control credits for that
QCSD performs padding in the client-to-server direction us- stream. QCSD uses QUIC’s MAX_STREAM_DATA (MSD) frame
ing QUIC’s PADDING and PING control frames. These frames to incrementally provide flow control credits to the server on
allow sending arbitrary amounts of null-valued data from the individual streams, which allows delaying or releasing data
client – from a few bytes padding a packet to entire packets according to the defence. QCSD splits data to be received as
of only chaff data – which is discarded by the receiver. As a a consequence of this approach, since it releases exactly how
result of QUIC’s pervasive encryption, these control frames much data it would like to receive.
are bitwise indistinguishable from normal application data. This results, however, in manipulating bursts of bytes which
In order to pad and chaff in the server-to-client direction, are delivered in one or more packets, instead of in individual
QCSD leverages the facts that HTTP GET requests are idem- packets. Frequently releasing small amounts of flow-control
potent [47]–[49] and that servers host numerous resources credit would enable finer-grained shaping, perhaps at sub-
beyond those required for the loading of any single web packet level bursts. However, servers may choose to aggre-
page. Additionally, as per the HTTP/3 standard [46], a single gate received flow credit, thereby limiting such fine-grained
HTTP/3 request-response cycle is conducted upon a single manipulation. For example, a server that receives two releases
stream and QUIC may place data from individual streams of 600 bytes of flow credit within a short span of time may
of a connection into the same packet [45]. Together, these aggregate its available flow credit and send a single 1200-byte
allow us to construct, track, and manage streams of chaff data response, instead of two 600-byte responses. QCSD therefore
from the server to the client, on which padding and chaff data does not send these releases at microsecond frequencies, but
can be requested independently of application data. We term instead aggregates the releases at the client and signals the
these streams chaff-streams to differentiate them from the server every few milliseconds. As we show in Section 5, de-
application streams carrying the HTTP data of the web page. spite aggregating releases QCSD is able to effectively emulate
defences from the literature.
3.1.2 Splitting and Delaying: Stream Flow Control
3.1.3 Putting it All Together
QUIC’s implementation as a user-space protocol allows us
to control transmissions from the client to the server, with- Figure 1 shows a simplified example of the QUIC packets and
out modifying the operating system or making changes that events generated by QCSD to add chaff traffic to a connection
would impact other applications at the client. We can therefore according to a trace P. These events occur in parallel with
choose how much data and when to send packets originating the regular communication between the client and the server.
from the client in accordance with the defence. Trace events for the outgoing direction are immediately satis-
For data originating from the server, QCSD utilises the fact fied by the client by sending additional packets, padded with
that QUIC multiplexes individual streams within a connection, PING and PADDING frames to create packets of the desired
1500
Outgoing
750
Outgoing
0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Out Target
Incoming
download completed
Figure 2: Packet times and sizes for https://www.pcgamingwiki.com/ and dependent QUIC resources when collected using a
simple, undefended client vs. shaped with QCSD towards the FRONT and Tamaraw defences. Packet time distributions compared
to the target defences are shown above. Packets below 150 bytes (250 bytes with Tamaraw) are not shown.
length, L. Events for the incoming direction are aggregated 3.2.1 Preparing the QUIC Connection
at the client for a control interval (0.1 s in this example but
around 0.005 s in practice) and the data is pulled on an ap- During initialisation, QCSD modifies the transport param-
propriate chaff stream at the end of each control interval by eters of the underlying QUIC connection, which are nego-
sending MAX_STREAM_DATA from the client to the server. For tiated between the client and server. It reduces the initial
a chaff-and-shape defence, the outgoing packets would also flow-control credit from the server to the client for all bidirec-
contain the HTTP data required for regular communication, tional streams to only 16 bytes. Bidirectional streams trans-
and the streams selected for incoming data would include port HTTP/3 requests and responses and thus setting a low
application streams in addition to chaff streams. initial flow-control credit prepares the streams for shaping by
preventing unscheduled transmissions from the server. This
Examples of QCSD shaping the index page and resources
is required for chaff-streams in the case of chaff-only mode,
of https://www.pcgamingwiki.com/ towards the FRONT [18]
and for both chaff and data streams in chaff-and-shape mode.
and Tamaraw [22] defences are shown in Figure 2 compared
Consequently, each time the client opens a new application
to the original traffic pattern and the target schedules gen-
(non-chaff) stream to the server in chaff-only mode, QCSD
erated by the defences. In the next section we describe the
immediately increases the flow control to their non-restricted
design of QCSD and its shaping algorithm in more detail.
levels with a MAX_STREAM_DATA frame.
QCSD accepts a target trace P corresponding to a defence, a Algorithm 1 outlines the central logic for shaping a connection
mode of operation m, and a control time interval I. The target towards the target trace P for the defence and is called with
trace P may be either a static schedule of packet timestamps the current time after every control interval I, or at the time
and sizes, or may be dynamically generated by the defence indicated by the next event in P. It operates on the events of
at runtime. While QCSD supports both types of targets, for P, that is, the tuples of time, size, and direction of data to be
simplicity we describe its operation in terms of a static sched- sent or received as indicated by the defence.
ule. The mode m is one of two modes as indicated by the type Shaping begins by determining whether to pull data from
of defence: “chaff-only” or “chaff-and-shape”. In chaff-only the server to the client for any previously unsatisfied incoming
mode, the trace P defines the cover traffic to be added to the events, pulling as much data as needed or available across
incoming and outgoing communication. In chaff-and-shape open application and chaff streams. Next, it processes each
mode, P is the target trace of the overall communication and event with a timestamp within the last control interval. QCSD
the traffic is delayed, padded, and otherwise reshaped towards uses data from application streams and chaff streams to fulfil
this trace. Table 1 shows an overview of the modes associ- these events. Application streams are only manipulated by
ated with various website-fingerprinting defences. Finally, the QCSD when reshaping the connection (chaff-and-shape) and
control interval I defines how frequently flow-control credits are shaped in the same manner as chaff streams. Where pos-
may be released to the server. It balances the granularity of sible, QCSD prioritises data from application streams over
shaping against the overhead of the control messages. chaff streams to reduce the impact of the defence on the load-
0.50
Bandwidth
0.25
0.00
Defended 1.43 (0.58–5.86) 4.16 (1.43–10.6)
– 0.25
Simulated 1.17 (0.48–4.99) 3.09 (0.90–8.35)
– ∗ 0.58 (0.20–2.44)
1 5 25 50 1 5 25 50
Sampling rate (ms) Sampling rate (ms) Latency
Defended −0.12 (−0.22–0.05) 8.43 (3.61–17.3)
(a) FRONT ∗ 2.34
Simulated 0.00 (0.69–6.98)
Pearson’s r LCSS
1.0
0.0
maintaining the connection. We could reduce this overhead
by bundling control messages with outgoing chaff as opposed
– 0.5 to sending them as normal application traffic. Second, since
1 5 25 50 1 5 25 50 we cannot construct the packets at the server, we pull the full
Sampling rate (ms) Sampling rate (ms) amount of the desired chaff as stream data, which is addi-
tionally encapsulated with QUIC and STREAM_FRAME headers.
(b) Tamaraw
This overhead could be reduced by estimating header lengths
Figure 3: Pearson’s r and LCSS between QCSD-collected and and reducing the amount of chaff requested accordingly.
simulated time series, aggregated at various sampling rates. In the case of the latency overhead, Table 2 shows that
both the simulated and QCSD approaches have overheads
near zero, as the addition of chaff packets does not affect the
(0.3 < r < 0.5) and strong (r > 0.5) correlations in the incom- original transmission duration. The small negative latency
ing and outgoing directions respectively at 50 ms granularity. is likely due to network variations across the defended and
This increasing trend arises from scheduling and network con- undefended settings as the interquartile range spans zero.
ditions causing small differences in time between the schedule
and when a packet is sent or observed, and thus becomes less 5.2 Shaping Defence ‘Tamaraw’
pronounced at higher granularities. The overall pattern of sent
and received chaff therefore matches the target schedule. At the other end of the defence spectrum lies heavy-weight
LCSS reports high similarity between the defended and defences such as Tamaraw [22], which add chaff and shape
simulated time series (Figure 3a), as it is better tailored to our the traffic from the client and the server to constant rates.
noisy environment. At both 1 ms and 5 ms granularities LCSS
indicated that both the defended and simulated time series
QCSD successfully emulates Tamaraw In Figure 3b, we
were transmitting similar amounts of bytes (within ε = 150)
can see both Pearson’s r and LCSS scores for the simulated
for over 85% of the intervals. The decrease at higher granular-
and defended time series for Tamaraw. Pearson’s r reports
ities is likely due to the aggregation of multiple packets within
median correlations of medium strength (∼0.5) in the direc-
a single interval in the calculation. For example, two control
tion from client to server at 25 ms and 50 ms granularities.
packets aggregated in an interval expected to be empty would
The negative Pearson’s r in the outgoing direction is indica-
no longer be within ε of 0 bytes. Nevertheless, the ability
tive of a phase-shift between the compared time series, and
of QCSD to closely emulate FRONT is apparent in the high
can occur, for example, when an odd number of packets are
LCSS score at 5 ms, the control interval used to pulled chaff.
aggregated in an interval in one series but an even in another.
In contrast, LCSS reports high similarities in excess of
FRONT emulation incurs small overheads Next, we mea- 0.87 in both the outgoing and incoming directions at 1 ms
sured the latency and bandwidth overhead incurred by QCSD granularity and around 0.5 at 5 ms granularity. Insufficient
when emulating FRONT. The bandwidth overhead, shown in data at the server, server aggregation of responses into unequal
Table 2, slightly increased from 1.17 in the simulated setting packets, such as of 1300 bytes and 200 bytes as opposed
to 1.43 in the defended setting. There are two reasons for this to two 750-byte packets, as well as network jitter resulting
increase. First, unlike in the simulated setting we send control in multiple packets arriving in the same interval likely all
packets from the client and receive acknowledgements from contribute to the reduction in the similarity score.
π20
encountered in FRONT and results from the addition of the 0.5
π20
0.5
in each interval could reduce this overhead. We discuss the
overhead difference between the live-generated schedule and
0.0
the simulated overhead [22] below with the latency overhead. 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
Recall Recall Recall
Tamaraw’s latency overhead diverges from the literature (b) QCSD-Tamaraw and simulated Tamaraw
The latency overhead, shown in Table 2 for QCSD’s emula-
tion of Tamaraw, is a surprising 8.43 times as long as in the Figure 4: r20 -precision-recall curves of attacks on single-
undefended case. By contrast, the latency overhead computed connection datasets defended with simulations or QCSD.
using Cai et al.’s Tamaraw simulation reports an expected
overhead of only 2.34 in the median.1 A similar divergence
is also seen in the bandwidth overhead. There are two pri- We trained these attacks on datasets from two scenarios:
mary reasons for this. First, some web pages do not declare single connections and full web-page loads. For each dataset,
resource lengths and fragment them across multiple HTTP/3 we trained on an 80 % stratified split with modest tuning of
data frames. In this case, QCSD cannot determine the length the classifier hyperparameters from the original papers (see
of the resource a-priori and must cautiously releases data on Appendix D). We additionally supplied them with vectors of
the stream. This primarily affects non-chaff resources in chaff- signed packet sizes (DF and Var-CNN), packet timestamps
and-shape defences, as chaff-resource lengths can be cached (Var-CNN), and 165 engineered size- and time-related fea-
and chaff-only streams do not throttle other resources. Second, tures [9] (k-FP). We then tested on the remaining 20 % of the
incoming and outgoing directions were simulated indepen- dataset and measured recall and precision.
dently. In reality however, delays in downloading the HTML
of the web page then delays requests for dependent resources Definition 6.1. Recall or true-positive rate is the fraction of
which themselves delay the fulfilment of those resources by monitored websites labelled as the correct monitored website.
the server: delays ripple throughout both directions. There-
fore, higher transmission rates than used in simulations would Definition 6.2. The precision of a classifier on a given dataset
be required to achieve lower latency overheads. is the fraction of correct monitored website claims. As this
is implicitly coupled to the number of positive and negative
samples in the dataset, we use Wang’s r-precision (πr ) [39]
6 Defending against WF Attacks at the Client which adjusts precision to an explicit ratio of monitored to
unmonitored websites. We use r = 20 corresponding to 1
Similarity measures cannot capture whether QCSD defences monitored website visit for every 20 unmonitored websites,
deter website-fingerprinting attacks. We therefore evalu- consistent with Wang [39].
ated QCSD’s ability to defend against modern attacks – k-
fingerprinting (k-FP) [9], Deep Fingerprinting (DF) [12], and
Var-CNN [15] – in the open-world setting. 6.1 Defending Single Connections
1 Weuse Cai et al.’s simulation as a simulation derived from the Tamaraw To determine QCSD’s capability in defending against these at-
schedule would have the same latency overhead as the defended setting. tacks, we evaluated them on single-connection datasets Dconn .
π20
whereas for an adversary prioritising precision, it reduced re- 0.5
π20
0.5
the chaff-only defence FRONT, QCSD fails to match the
chaff-and-shape defence Tamaraw in its ability to render the
attacks ineffectual. When orchestrated with QCSD, and de- 0.0
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
spite reducing either precision or recall to below 0.4 (DF and
Recall Recall Recall
Var-CNN) and 0.7 (k-FP) when prioritising the other, QCSD
did not achieve Tamaraw’s featureless transmission. With a (b) QCSD-Tamaraw across multiple connections
constant outgoing transmission rate, the obvious source of
this discrepancy is the inexact shaping from the server. Refine- Figure 5: r20 -precision-recall curves on the undefended and
ments of the approaches used in QCSD are possible and could QCSD-defended multi-connection datasets.
improve the performance of a near-constant rate defence such
as Tamaraw. Even so, QCSD-Tamaraw provides protection
when compared to the undefended setting, reducing precision By contrast, Tamaraw effectively defended against DF but did
by 0.6 at recall values in excess of 0.75, and could be bol- not offer comparable performance against k-FP and Var-CNN.
stered with support to ensure exact-length packets from the Examination of feature importances in k-FP’s underlying
server, as discussed in Section 7. random-forest indicated that variance in incoming packet
sizes and total incoming data were among the top features
used for classification of QCSD-defended Tamaraw. There are
6.2 Defending Full Web-Page Loads two primary reasons for this. The first is discrepancy between
the resource sizes recorded in the dependency graph, which
When loading a web page, the browser may open multiple was used to determine chaff resources (Appendix B.2), and
connections to different web servers to request the various their size during the evaluation. When the lengths of chaff
resources that constitute the web page. In this section, we resources in the cache were overestimated (e.g., a resource
evaluate the ability of QCSD to defend full web-page loads, unexpectedly delivering an HTTP 404) the difference would
consisting of resources downloaded on multiple connections leak size information. This discrepancy arose from selecting
from different servers, in two scenarios. First, we evaluated resources requiring cookies, javascript, being dynamic, etc.,
the ability of our simple multi-connection QCSD client to de- and could be addressed with better identification of chaff re-
fend against attacks when shaping multiple QUIC connections sources when integrating with a browser. The second is due
towards the FRONT and Tamaraw defences (Dmc ). Second, to accommodating servers that do not respond with data un-
we evaluated QCSD’s ability to defend against attacks when less they have been provided with some minimum amount
loading the entire web page through the browser (QUIC and of flow credits (Appendix B.3). This leads to QCSD, for ex-
TCP resources), by crafting FRONT on a single-connection ample, providing 1500-bytes of flow credit and the server
towards the same web server (D f ull ). responding with only 800-bytes at the end of the stream. This
leakage could be potentially mitigated by identifying these
QCSD defends multi-connection loads Figure 5 shows servers and using coarse grained shaping with them and more
attack precision and recall against traces defended with fine-grained shaping with well-behaving servers. We discuss
QCSD across multiple connections. Similarly to the single- further improvements in Section 7.
connection setting, when defending with FRONT, QCSD was Overall, the clear reductions in attack performance confirm
able to significantly reduce attack precision and recall scores. the feasibility of extending QCSD to shaping the multiple con-
In this work, we designed a QUIC client-side defence frame- [6] Q. Sun, D. R. Simon, Y.-M. Wang, W. Russell, V. N. Padman-
abhan, and L. Qiu, “Statistical identification of encrypted web
work (QCSD) that enables emulating website-fingerprinting
browsing traffic,” in Proc. 2002 IEEE Symp. on Security and
defences without requiring any changes to servers or the de-
Privacy, May 2002. DOI: 10.1109/secpri.2002.1004359.
ployment of new services. We implemented and evaluated
QCSD, which shows great promise in enabling clients to enact [7] A. Hintz, “Fingerprinting websites using traffic analysis,” in
Privacy Enhancing Technologies, 2003. DOI: 10.1007/3-
defences from the browser or application, without needing to
540-36467-6_13.
make any changes to existing server stacks. Our evaluations
of QCSD show that not only can it orchestrate chaff defences [8] T. Wang, X. Cai, R. Nithyanand, R. Johnson, and I. Gold-
such as FRONT, matching both the chaff-specification and berg, “Effective attacks and provable defenses for website
fingerprinting,” in 23rd USENIX Security Symp. (USENIX
the levels of protection provided by the conceptualised de-
Security 14), Aug. 2014. [Online]. Available: https : / /
fence, but also to orchestrate chaff-and-shape defences such
www . usenix . org / conference / usenixsecurity14 /
as Tamaraw. We anticipate that QCSD represents a promising technical-sessions/presentation/wang_tao.
direction for future work on deployable defences.
[9] J. Hayes and G. Danezis, “k-fingerprinting: A robust scal-
able website fingerprinting technique,” in 25th USENIX
Acknowledgements Security Symp. (USENIX Security 16), Aug. 2016. [On-
line]. Available: https : / / www . usenix . org /
We thank the anonymous reviewers for their many useful conference/usenixsecurity16/technical-sessions/
suggestions, and in particular our shepherd, Matthew Wright, presentation/hayes.
whose insightful feedback greatly helped to improve the pa- [10] Z. Zhuo, Y. Zhang, Z.-l. Zhang, X. Zhang, and J. Zhang,
per. We gratefully acknowledge support from the ETH4D and “Website fingerprinting attack on anonymity networks based
EPFL EssentialTech Centre Humanitarian Action Challenge on profile hidden Markov model,” IEEE Trans. Information
Grant. This work was also supported by the National Science Forensics and Security, May 2018. DOI: 10 . 1109 / TIFS .
Foundation under grants CNS-1553437 and CNS-1704105, 2017.2762825.
and by the United States Air Force and DARPA under Con- [11] V. Rimmer, D. Preuveneers, M. Juarez, T. V. Goethem, and
tract No. FA8750-19-C-0079. Any opinions, findings and W. Joosen, “Automated website fingerprinting through deep
conclusions or recommendations expressed in this material learning,” in Proc. 2018 Network and Distributed Systems
are those of the author(s) and do not necessarily reflect the Security Symp., 2018. DOI: 10.14722/ndss.2018.23105.
views of the United States Air Force, DARPA, or any other [12] P. Sirinam, M. Imani, M. Juarez, and M. Wright, “Deep Fin-
sponsoring agency. gerprinting: Undermining website fingerprinting defenses
with deep learning,” in Proc. 2018 ACM SIGSAC Conf. Com-
puter and Communications Security, 2018. DOI: 10.1145/
References 3243734.3243768.
[1] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The [13] R. Attarian, L. Abdi, and S. Hashemi, “AdaWFPA: Adaptive
second-generation onion router,” in 13th USENIX Security online website fingerprinting attack for tor anonymous net-
Symp. (USENIX Security 04), Aug. 2004. [Online]. Available: work: A stream-wise paradigm,” Computer Communications,
https://www.usenix.org/conference/13th- usenix- Dec. 2019. DOI: 10.1016/j.comcom.2019.09.008.
security-symposium/tor-second-generation-onion- [14] S. E. Oh, S. Sunkam, and N. Hopper, “P1-FP: Extraction,
router. classification, and prediction of website fingerprints with deep
[2] O. Valentine. “VPN usage around the world in 2018,” Glob- learning,” Proc. Privacy Enhancing Technologies, Jul. 2019.
alWebIndex. (Jul. 2, 2018), [Online]. Available: https:// DOI : 10.2478/popets-2019-0043.
blog.gwi.com/chart- of- the- day/vpn- usage- 2018/
(visited on Sep. 17, 2021).
Created Closed
FIN bit or error
HTTP-GET
queued
Unthrottled
First byte sent / d ← 0, FIN bit or error
send MAX_STREAM_DATA{∞}
Figure 7: State transitions for chaff and application streams in QCSD. Denoted are the current max stream data offset, msd,
the amount of bytes believed to be available at the server, limit, the amount of data consumed by the client on the stream,
c, and the amount of data observed on the stream, d. Annotations are of the form “triggering event / resulting action”, with
the actions marked as A1 –A4 denoting A1 : limit ← max {limit, c + x}; A2 : c ← c + x, limit ← max {limit, c + excess}; A3 :
limit ← max {limit, x}; and A4 : msd ← msd + max {limit − msd, x}, send MAX_STREAM_DATA{msd}.
Unthrottled streams When the first byte is sent on mit 870763) and logged the resources requested during the
an unthrottled stream, QCSD immediately sends a loading of the page. From these logs, we created a depen-
MAX_STREAM_DATA frame to increase the low initial max dency graph of requested resources. We considered a URL
stream data offset set during initialisation. The max stream A a dependency of URL B if A was an HTTP referrer of B,
data offset is increased to allow the server to send as much if A initiated the request of B (such as through include state-
data on the stream as in an unmodified QUIC connection. ments in CSS), or if B’s request was the result of a sequence
Further releases of max stream data is then handled by the of JavaScript function calls that included the script at URL
original QUIC client logic. QCSD also begins tracking the A. We filtered this graph to only URLs with the same origin
amount of HTTP payload seen on that stream. Once this of the final redirected URL (and thus be requested over the
has been determined, the resource that was loaded over that same connection) and removed graphs that redirected to the
unthrottled stream can be re-requested on a chaff stream. same page. Finally, these dependency graphs were passed
to our test client to collect the actual network trace; and for
each trace, we recorded the packet sizes and timestamps at
C Further Details on Dataset Collection the client using the tcpdump network monitoring utility.
Failure density
1.5